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ABSTRACT 

A key problem in sensor networks is to decide which sensors 
to query when, in order to obtain the most useful information 
(e.g., for performing accurate prediction), subject to con- 
straints (e.g., on power and bandwidth). In many applications 
the utility function is not known a priori, must be learned 
from data, and can even change over time. Furthermore for 
large sensor networks solving a centralized optimization prob- 
lem to select sensors is not feasible, and thus we seek a fully 
distributed solution. In this paper, we present Distributed 
Online Greedy (DOG), an efficient, distributed algorithm for 
repeatedly selecting sensors online, only receiving feedback 
about the utility of the selected sensors. We prove very strong 
theoretical no-regret guarantees that apply whenever the (un- 
known) utility function satisfies a natural diminishing returns 
property called submodularity . Our algorithm has extremely 
low communication requirements, and scales well to large 
sensor deployments. We extend DOG to allow observation- 
dependent sensor selection. We empirically demonstrate the 
effectiveness of our algorithm on several real-world sensing 
tasks. 
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C.2.1 [Computer-Communication Networks]: Network Ar- 
chitecture and Design; G.3 [Probability and Statistics]: Ex- 
perimental Design; 1.2.6 [AI]: Learning 

General Terms 
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Keywords 
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I. INTRODUCTION 

A key challenge in deploying sensor networks for real- 
world applications such as environmental monitoring [19], 
building automation [25] and others is to decide when to acti- 
vate the sensors in order to obtain the most useful information 
from the network (e.g., accurate predictions at unobserved 



locations) and to minimize power consumption. This sensor 
selection problem has received considerable attention [1, 32, 
10], and algorithms with performance guarantees have been 
developed [1, 16]. However, many of the existing approaches 
make simplifying assumptions. Many approaches assume (1) 
that the sensors can perfectly observe a particular sensing 
region, and nothing outside the region [1]. This assumption 
does not allow us to model settings where multiple noisy 
sensors can help each other obtain better predictions. There 
are also approaches that base their notion of utility on more 
detailed models, such as improvement in prediction accuracy 
w.r.t. some statistical model [10] or detection performance 
[18]. However, most of these approaches make two crucial 
assumptions: (2) The model, upon which the optimization 
is based, is known in advance (e.g., based on domain knowl- 
edge or data from a pilot deployment) and (3), a centralized 
optimization selects the sensors (i.e., some centralized pro- 
cessor selects the sensors which obtain highest utility w.r.t. 
the model). We are not aware of any approach that simulta- 
neously addresses the three main challenges (1), (2) and (3) 
above and still provides theoretical guarantees. 

In this paper, we develop an efficient algorithm, called 
Distributed Online Greedy (DOG), which addresses these 
three central challenges. Prior work [17] has shown that 
many sensing tasks satisfy an intuitive diminishing returns 
property, submodularity, which states that activating a new 
sensor helps more if few sensors have been activated so far, 
and less if many sensors have already been activated. Our 
algorithm applies to any setting where the true objective is 
submodular [23], thus capturing a variety of realistic sensor 
models. Secondly, our algorithm does not require the model 
to be specified in advance: it learns to optimize the objective 
function in an online manner. Lastly, the algorithm is dis- 
tributed; the sensors decide whether to activate themselves 
based on local information. We analyze our algorithm in the 
no-regret model, proving convergence properties similar to 
the best bounds for any centralized solution. 

A bandit approach toward sensor selection. At the heart 
of our approach is a novel distributed algorithm for multi- 



armed bandit (MAB) problems. In the classical multiarmed 
bandit [24] setting, we picture a slot machine with multiple 
arms, where each arm generates a random payoff with un- 
known mean. Our goal is to devise a strategy for pulling 
arms to maximize the total reward accrued. The difference 
between the optimal arm's payoff and the obtained payoff 
is called the regret. Known algorithms can achieve average 
per-round regret of 0(y/n logn/ VT) where n is the number 
of arms, and T the number of rounds (see e.g. the survey of 
[13]). Suppose we would like to, at every time step, select k 
sensors. The sensor selection problem can then be cast as a 
multiarmed bandit problem, where there is one arm for each 
possible set of k sensors, and the payoff is the accrued utility 
for the selected set. Since the number of possible sets, and 
thus the number of arms, is exponentially large, the resulting 
regret bound is O '{n k 12 \J\og n/VT), i.e., exponential in k. 
However, when the utility function is submodular, the payoffs 
of these arms are correlated. Recent results [28] show that this 
correlation due to submodularity can be exploited by reducing 
the n k -armed bandit problem to k separate n-armed bandit 
problems, with only a bounded loss in performance. Existing 
bandit algorithms, such as the widely used EXP3 algorithm 
[2], are centralized in nature. Consequently, the key challenge 
in distributed online submodular sensing is how to devise a 
distributed bandit algorithm. In Sec. 4 and 5, we develop a 
distributed variant of EXP3 using novel algorithms to sample 
from and update a probability distribution in a distributed way. 
Roughly, we develop a scheme where each sensor maintains 
its own weight, and activates itself independently from all 
other sensors purely depending on this weight. 

Observation specific selection. A shortcoming of central- 
ized sensor selection is that the individual sensors' current 
measurements are not considered in the selection process. 
In many applications, obtaining sensor measurements is less 
costly than transmitting the measurements across the network. 
For example, cell phones used in participatory sensing [5] can 
inexpensively obtain measurements on a regular basis, but it 
is expensive to constantly communicate measurements over 
the network. In Sec. 6, we extend our distributed selection 
algorithm to activate sensors depending on their observations, 
and analyze the tradeoff between power consumption and the 
utility obtained under observation specific activation. 

Communication models. We analyze our algorithms under 
two models of communication cost: In the broadcast model, 
each sensor can broadcast a message to all other sensors at 
unit cost. In the star network model, messages can only be 
between a sensor and the base station, and each message has 
unit cost. In Sec. 4 we formulate and analyze a distributed 
algorithm for sensor selection under the simpler broadcast 
model. Then, in Sec. 5 we show how the algorithm can be 



extended to the star network model. 

Our main contributions. 

• Distributed EXP3, a novel distributed implementation 
of the classic multiarmed bandit algorithm. 

• Distributed Online Greedy (DOG) and LAZYDOG, 
novel algorithms for distributed online sensor selection, 
which apply to many settings, only requiring the utility 
function to be submodular. 

• OD-DOG, an extension of DOG to allow for observation- 
dependent selection. 

• We analyze our algorithm in the no-regret model and 
prove that it attains the optimal regret bounds attainable 
by any efficient centralized algorithm. 

• We evaluate our approach on several real-world sensing 
tasks including monitoring a 12,527 node network. 

Finally, while we do not consider multi-hop or general 
network topologies in this paper, we believe that the ideas 
behind our algorithms will likely prove valuable for sensor 
selection in those models as well. 

2. THE SENSOR SELECTION PROBLEM 

We now formalize the sensor selection problem. Suppose a 
network of sensors has been deployed at a set of locations V 
with the task of monitoring some phenomenon (e.g., tempera- 
ture in a building). Constraints on communication bandwidth 
or battery power typically require us to select a subset A of 
these sensors for activation, according to some utility func- 
tion. The activated sensors then send their data to a server 
(base station). We first review the traditional offline setting 
where the utility function is specified in advance, illustrating 
how submodularity allows us to obtain provably near-optimal 
selections. We then address the more challenging setting 
where the utility function must be learned from data in an 
online manner. 

2.1 The Offline Sensor Selection Problem 

A standard offline sensor selection algorithm chooses a 
set of sensors that maximizes a known sensing quality ob- 
jective function f(A), subject to some constraints, e.g., on 
the number of activated sensors. One possible choice for 
the sensing quality is based on prediction accuracy (we will 
discuss other possible choices later on). In many applications, 
measurements are correlated across space, which allows us to 
make predictions at the unobserved locations. For example, 
prior work [10] has considered the setting where a random 
variable X s is associated with every location s e V, and 
a joint probability distribution P(Xy) models the correla- 
tion between sensor values. Here, Xy = \X\ , . . . , X n ] is 
the random vector over all measurements. If some measure- 
ments Xj± = are obtained at a subset of locations, then 



the conditional distribution P{Xv\A I %A = x a) allows 
predictions at the unobserved locations, e.g., by predicting 
E[<-tV\.4 | Xa. = *a]- Furthermore, this conditional distribu- 
tion quantifies the uncertainty in the prediction: Intuitively, 
we would like to select sensors that minimize the predictive 
uncertainty. One way to quantify the predictive uncertainty is 
the mean squared prediction error, 

MSE(AV\_4 | x^t) = ^ Y, n(Xs-nXs |x^]) 2 |xa]. 

sev\A 

In general, the measurements that sensors A will make is 
not known in advance. Thus, we can base our optimization 

on the expected mean squared prediction error, 

EMSE(^) = J dp{x A ) MSE(AV\.a I Xa)- 

Equivalently, we can maximize the reduction in mean squared 
prediction error, 

/emse (-4) = EMSE(0) - EMSE(^). 

By definition, /emse(0) = 0, i.e., no sensors obtain no utility. 
Furthermore, /emse is monotonic: if A C B C V, then 
/emse (-4.) < /emse(B). i-e., adding more sensors always 
helps. That means, /emse is maximized by the set of all 
sensors V. However, in practice, we would like to only select 
a small set of, e.g., at most k sensors due to bandwidth and 
power constraints: 

A* = arg max /emse (*4) s.t. \A\ < k. 

A 

Unfortunately, this optimization problem is NP-hard, so we 
cannot expect to efficiently find the optimal solution. Fortu- 
nately, it can be shown [9] that in many settings 1 , the func- 
tion /emse satisfies an intuitive diminishing returns property 
called submodularity. A set function / : 2 V — > R is called 
submodular if, for all A C B C V and s € V \ B it holds 
that f(A U {s}) - f(A) > f(B U {s}) - f(B). Many other 
natural objective functions for sensor selection satisfy sub- 
modularity as well [17]. For example, the sensing region 
model where /hbg(-4) is the total area covered by all sen- 
sors A is submodular. The detection model where Jdet{A) 
counts the expected number of targets detected by sensors A 
is submodular as well. 

A fundamental result of Nemhauser et al. [23] is that for 
monotone submodular functions, a simple greedy algorithm, 
which starts with the empty set Ao — and iteratively adds 
the element 

s fe = argmax f(A k -i U {s}); A k = A k -i U {s k } 
sev\A k -! 

which maximally improves the utility obtains a near-optimal 
'For Gaussian models and conditional suppressorfreeness [9] 



solution: For the set A k it holds that 

f(A k )> (1-1/e) max/(^), 

i.e., the greedy solution obtains at least a constant fraction of 
(1 — 1/e) w 63% of the optimal value. 

One fundamental problem with this offline approach is that 
it requires the function / to be specified in advance, i.e., be- 
fore running the greedy algorithm. For the function /emse, 
this means that the probabilistic model P(Xy) needs to be 
known in advance. While for some applications some prior 
data, e.g., from pilot deployments, may be accessible, very 
often no such prior data is available. This leads to a "chicken- 
and-egg" problem, where sensors need to be activated to 
collect data in order to learn a model, but also the model is 
required to inform the sensor selection. This is akin to the 
"exploration-exploitation tradeoff in reinforcement learning 
[2], where an agent needs to decide whether to explore and 
gather information about effectiveness of an action, or to 
exploit, i.e., choose actions known to be effective. In the 
following, we devise an online monitoring scheme based on 
this analogy. 

2.2 The Online Sensor Selection Problem 

We now consider the more challenging problem where the 
objective function is not specified in advance, and needs to be 
learned during the monitoring task. We assume that we intend 
to monitor the environment for a number T of time steps 
(rounds). In each round t, a set St of sensors is selected, and 
these sensors transmit their measurements to a server (base 
station). The server then determines a sensing quality f t (S t ) 
quantifying the utility obtained from the resulting analysis. 
For example, if our goal is spatial prediction, the server would 
build a model based on the previously collected sensor data, 
pick a random sensor s, make prediction for the variable X s , 
and then compare the prediction fi s with the sensor reading 
x s . The error f t = a\ — (p, s — x s ) 2 is an unbiased estimate of 
the reduction in EMSE. In the following analysis, we will only 
assume that the objective functions f t are bounded (w.l.o.g., 
take values in [0, 1]), monotone, and submodular, and that we 
have some way of computing ft(S) for any subset of sensors 
S. Our goal is to maximize the total reward obtained by the 
system over T rounds, J2t=i ft(St)- 

We seek to develop a protocol for selecting the sets St 
of sensors at each round, such that after a small number 
of rounds the average performance of our online algorithm 
converges to the same performance of the offline strategy 
(that knows the objective functions). We thus compare our 
protocol against all strategies that can select a fixed set of 
k sensors for use in all of the rounds; the best such strategy 
obtains reward max S cy : |s|<fc St=i ft(S)- The difference 
between this quantity and what our protocol obtains is known 
as its regret, and an algorithm is said to be no-regret if its 



average regret tends to zero (or less) 2 as T — > oo. 

When k = 1, our problem is simply the well-studied mul- 
tiarmed bandit (MAB) problem, for which many no-regret 
algorithms are known [13]. For general k, because the aver- 
age of several submodular functions remains submodular, we 
can apply the result of Nemhauser et al. [23] (cf., Sec. 2.1) 
to prove that a simple greedy algorithm obtains a (1 — 1/e) 
approximation to the optimal offline solution. Feige [12] 
showed that this is optimal in the sense that obtaining a 
(1 — 1/e + e) approximation for any e > is NP-hard. These 
facts suggest that we cannot expect any efficient online algo- 
rithm to converge to a solution better than 
(1 - 1/e) max 5 cy : |s|<fe EfLi ft{S)- We therefore define 
the (1 — l/e)-regret of a sequence of (possibly random) sets 
{MLi as 

T T 

R T := (1 - 1/e) • max V/t(S) - 

SCV:\S\<k^ t^l 

where the expectation is taken over the distribution for each 
St. We say an online algorithm producing a sequence of sets 
has no-(l — l/e)-regret if lim supy^^ ^ < 0. 

3. CENTRALIZED ALGORITHM FOR ON- 
LINE SENSOR SELECTION 

Before developing the distributed algorithm for online sen- 
sor selection, we will first review a centralized algorithm 
which is guaranteed to achieve no (1 — l/e)-regret. In Sec. 
4 we will show how this centralized algorithm can be imple- 
mented efficiently in a distributed manner. This algorithm 
starts with the greedy algorithm for a known submodular 
function mentioned in Sec. 2.1, and adapts it to the online 
setting. Doing so requires an online algorithm for selecting 
a single sensor as a subroutine, and we review such an algo- 
rithm in Sec. 3.1 before discussing the centralized algorithm 
for selecting multiple sensors in Sec. 3.2. 

3.1 Centralized Online Single Sensor Selection 

Let us first consider the case where k = 1, i.e., we would 
like to select one sensor at each round. This simpler problem 
can be interpreted as an instance of the mulfiarmed bandit 
problem (as introduced in Sec. 2.2), where we have one arm 
for each possible sensor. In this case, the EXP3 algorithm [2] 
is a centralized solution for no-regret single sensor selection. 
EXP3 works as follows: It is parameterized by a learning 
rate r\, and an exploration probability 7. It maintains a set of 
weights w s , one for each arm (sensor) s, initialized to 1. At 
every round t, it will select each arm s with probability 



2 Formally, if Rt is the total regret for the first T rounds, no-regret 
means limsupy^^ Rt/T < 0. 



i.e., with probability 7 it explores, picking an arm uniformly 
at random, and with probability (1 — 7) it exploits, picking an 
arm s with probability proportional to its weight w s . Once an 
arm s has been selected, a feedback r = ft({s}) is obtained, 
and the weight w s is updated to 

w s <- w s exp^r/ps). 

Auer et al. [2] showed that with appropriately chosen learning 
rate 77 and exploration probability 7 it holds that the cumu- 
lative regret R T of EXP3 is 0(s/Tn In n), i.e., the average 
regret Rt/T converges to zero. 

3.2 Centralized Selection of Multiple Sensors 

In principle, we could interpret the sensor selection problem 
as a (^) -armed bandit problem, and apply existing no-regret 
algorithms such as EXP3. Unfortunately, this approach does 
not scale, since the number of arms grows exponentially with 
k. However, in contrast to the traditional multiarmed bandit 
problem, where the arms are assumed to have independent 
payoffs, in the sensor selection case, the utility function is 
submodular and thus the payoffs are correlated across dif- 
ferent sets. Recently, Streeter and Golovin showed how this 
submodularity can be exploited, and developed a no-(l — 1 /e)- 
regret algorithm for online maximization of submodular func- 
tions [28]. The key idea behind their algorithm, OG un i t , is 
to turn the offline greedy algorithm into an online algorithm 
by replacing the greedy selection of the element Sk that max- 
imizes the benefit Sk = argmax s /({si, Sfc-i} U {s}) 
by a bandit algorithm. As shown in the pseudocode be- 
low, OG UNIT maintains fc bandit algorithms, one for each 
sensor to be selected. At each round t, it selects k sen- 
sors according to the choices of the k bandit algorithms 
Si 3 . Once the elements have been selected, the i th ban- 
dit algorithm £i receives as feedback the incremental benefit 
ft{s\, . . . , Sj) — /t(si, . . . , Si__i), i.e., how much additional 
utility is obtained by adding sensor Si to the set of already 
selected sensors. Below we define [m] := {1,2,..., m}. 

Algorithm OG UNIT from [28]: 

Initialize k multiarmed bandit algorithms £\, £2, ■ ■ ■ , £fc> 
each with action set V. 
For each round t e [T] 
For each stage i e [k] in parallel 

£i selects an action v\ 
For each i e [k] in parallel 

feedback f t ({v) : j < *}) - f t {{v) : j < »}) to £ t . 
Output S t — {a\,a 2 , . . . , a|,}. 

In [27] it is shown that OG UNIT has a (l — ^)-regret bound 
of O(kR) in this feedback model assuming each £; has ex- 
pected regret at most R. Thus, when using EXP3 as a sub- 
routine, OG UN1T has no-(l — l/e)-regret. 
3 Bandits with duplicate choices are handled in Sec. 4.6.1 of [28] 



Unfortunately, EXP3 (and in fact all MAB algorithms with 
no-regret guarantees for non-stochastic reward functions) re- 
quire sampling from some distribution with weights associ- 
ated with the sensors. If n is small, we could simply store 
these weights on the server, and run the bandit algorithms £j 
there. However, this solution does not scale to large numbers 
of sensors. Thus the key problem for online sensor selection is 
to develop a multiarmed bandit algorithm which implements 
distributed sampling across the network, with minimal over- 
head of communication. In addition, the algorithm needs to 
be able to maintain the distributions (the weights) associated 
with each £j in a distributed fashion. 

4. DISTRIBUTED ALGORITHM FOR 
ONLINE SENSOR SELECTION 

We will now develop DOG, an efficient algorithm for dis- 
tributed online sensor selection. For now we make the follow- 
ing assumptions: 

1 . Each sensor v e V is able to compute its contribution 
to the utility f t (S U {v}) — ft(S), where S are a subset 
of sensors that have already been selected. 

2. Each sensor can broadcast to all other sensors. 

3. The sensors have calibrated clocks and unique, linearly 
ordered identifiers. 

These assumptions are reasonable in many applications: 
(1) In target detection, for example, the objective function 
ft(S) counts the number of targets detected by the sensors 

5. Once previously selected sensors have broadcasted which 
targets they detected, the new sensor s can determine how 
many additional targets have been detected. Similarly, in sta- 
tistical estimation, one sensor (or a small number of sensors) 
randomly activates each round and broadcasts its value. After 
sensors S have been selected and announced their measure- 
ments, the new sensor s can then compute the improvement 
in prediction accuracy over the previously collected data. (2) 
The assumption that broadcasts are possible may be realistic 
for dense deployments and fairly long range transmissions. 
In Sec. 5 we will show how assumptions (1) and (2) can be 
relaxed. 

As we have seen in Sec. 3, the key insight in developing 
a centralized algorithm for online selection is to replace the 
greedy selection of the sensor which maximally improves the 
total utility over the set of previously selected sensors by a 
bandit algorithm. Thus, a natural approach for developing a 
distributed algorithm for sensor selection is to first consider 
the single sensor case. 

4.1 Distributed Selection of a Single Sensor 

The key challenge in developing a distributed version of 
EXP 3 is to find a way to sample exactly one element from a 



probability distribution p over sensors in a distributed manner. 
This problem is distinct from randomized leader election [22], 
where the objective is to select exactly one element but the ele- 
ment need not be drawn from a specified distribution. We note 
that under the multi-hop communication model, sampling one 
element from the uniform distribution given a rooted span- 
ning tree can be done via a simple random walk [20], but that 
under the broadcast and star network models this approach de- 
generates to centralized sampling. Our algorithm, in contrast, 
samples from an arbitrary distribution by allowing sensors to 
individually decide to activate. Our bottom-up approach also 
has two other advantages: (1) it is amenable to modification 
of the activation probabilities based on local observations, as 
we discuss in Sec. 6, and (2) since it does not rely on any 
global state of the network such as a spanning tree, it can 
gracefully cope with significant edge or node failures. 

A naive distributed sampling scheme. A naive distributed 
algorithm would be to let each sensor keep track of all ac- 
tivation probabilities p. Then, one sensor (e.g., with the 
lowest identifier) would broadcast a single random number 
u uniformly distributed in [0,1], and the sensor v for which 
Y^iZi Pi < u < Y^i=i Pi would activate. However, for large 
sensor network deployments, this algorithm would require 
each sensor to store a large amount of global information (all 
activation probabilities p). Instead, each sensor v could store 
only their own probability mass p v ; the sensors would then, 
in order of their identifiers, broadcast their probabilities p v , 
and stop once the sum of the probabilities exceeds u. This 
approach only requires a constant amount of local informa- 
tion, but requires an impractical 9(n) messages to be sent, 
and sent sequentially over 9(n) time steps. 

Distributed multinomial sampling. In this section we 
present a protocol that requires only 0(1) messages in expec- 
tation, and only a constant amount of local information. 

For a sampling procedure with input distribution p, we let 
p denote the resulting distribution, where in all cases at most 
one sensor is selected, and nothing is selected with probability 
1 — J2 V Pv A simple approach towards distributed sampling 
would be to activate each sensor v £ V independently from 
each other with probability p v . While in expectation, exactly 
one sensor is activated, with probability Y\ v (l — p v ) > 
no sensor is activated; also since sensors are activated inde- 
pendently, there is a nonzero probability that more than one 
sensor is activated. Using a synchronized clock, the sensors 
could determine if no sensor is activated. In this case, they 
could simply repeat the selection procedure until at least one 
sensor is activated. One naive approach would be to repeat 
the selection procedure until exactly one sensor is activated. 
However with two sensors and pi = e,p 2 = 1 — £ this algo- 
rithm yields p 1 = e 2 /(l - 2e + 2e 2 ) = C(e 2 ), so the first 



sensor is severely underrepresented. Another simple protocol 
would be to select exactly one sensor uniformly at random 
from the set of activated sensors, which can be implemented 
using few messages. 



The Simple Protocol: 

For each sensor v in parallel 
Sample X v <~ Bernoulli^). 
If (X v = 1), X v activates. 
All active sensors S coordinate to select a single sen- 
sor uniformly at random from S, e.g., by electing the 
minimum ID sensor in S to do the sampling. 



It is not hard to show that with this protocol, for all sensors v, 
1 



p v = p v ■ E 



\S\ 



>p v /E[\S\\v€ S]>p v /2 



by appealing to Jensen's inequality. Since p v < p v , we find 
that this simple protocol maintains a ratio r v := p v /p v € 
[5,1]. Unfortunately, this analysis is tight, as can be seen 
from the example with two sensors and p\ = e,p2 = 1 — e. 

To improve upon the simple protocol, first consider running 
it on an example with p 1 = p 2 = ■ ■ ■ = p n = 1 jn. Since 
the protocol behaves exactly the same under permutations of 
sensor labels, by symmetry we have p\ = pi = ■ ■ ■ = p n , and 
thus Ti — rj for all Now consider an input distribution 
p where there exists integers N and fci , k 2 , ■ ■ ■ , k n such that 
Pv = k v /N for all v. Replace each v with k v fictitious 
sensors, each with probability mass 1 /N, and each with a 
label indicating v. Run the simple protocol with the fictitious 
sensors, selecting a fictitious sensor v', and then actually 
select the sensor indicated by the label of v'. By symmetry 
this process selects each fictitious sensor with probability 
(1 — f$) /N, where j3 is the probability that nothing at all is 
selected, and thus the process selects sensor v with probability 
k v (1 — /3) /N = (1— (3)p v (since at most one fictitious sensor 
is ever selected). 

We may thus consider the following improved protocol 
which incorporates the above idea, simulating this modifica- 
tion to the protocol exactly when p v = k v /N for all v. 



The Improved Protocol(A): 

For each sensor v in parallel 

Sample X v ~ Binomial (|\/V • p v ~\ , l/N). 
If (X v > 1), then activate sensor v. 
From the active sensors S, select sensor v with proba- 
bility X v /J2 v > e s X v'- 



This protocol ensures the ratios r v := p v /p v are the same 
for all sensors, provided each p v is a multiple of 1 /N. As- 
suming the probabilities are rational, there will be a suffi- 
ciently large N to satisfy this condition. To reduce f3 := 
Pr [S = 0] in the simple protocol, we may sample each X v 
from Bcrnoulli(o; ■ p v ) for any a e [l,ra]. The symmetry 



argument remains unchanged. This in turn suggests sampling 
X v from Binomial(|~A • p v ~\ , a/N) in the improved proto- 
col. Taking the limit as N — > 00, the binomial distribution 
becomes Poisson, and we obtain the desired protocol. 



The Poisson Multinomial Sampling (PMS) Protocol(a): 

Same as the improved protocol, except each 
sensor v samples X v ~ Poisson(ap t> ) 



Straight-forward calculation shows that 

Pr [5 = 0]= ^Qcxp{-a ■ p v } = exp{- a-p v ) = e~ a 

V V 

Let C be the number of messages. Then 

e [c] = Pr \ x v > 1] = E( 1 - e " ap ^) <J2 a p- = a 

V V V 

Here we have used linearity of expectation, and 1 + x < e x 
for all x e E. In summary, we have the following result about 
our protocol: 

Proposition 1. Fix any fixed p and a > 0. The PMS 
Protocol always selects at most one sensor, ensures 

Vv : Pr [v selected] — (1 — e~ a )p v 

and requires no more than a messages in expectation. 

In order to ensure that exactly one sensor is selected, when- 
ever S — we can simply rerun the protocol with fresh 
random seeds as many times as needed until S is non-empty. 
Using a = 1, this modification will require only 0(1) mes- 
sages in expectation and at most O(logn) messages with 
high probability in the broadcast model. We can combine this 
protocol with EXP3 to get the following result. 

THEOREM 2. In the broadcast model, running EXP3 us- 
ing the PMS Protocol with a — 1, and rerunning the pro- 
tocol whenever nothing is selected, yields exactly the same 
regret bound as standard EXP3, and in each round at most 
e/(e— l) + 2 w 3.582 messages are broadcast in expectation. 



The regret bound for EXP3 is 0(y/QPTn logn), where 
OPT is the total reward of the best action. Our variant sim- 
ulates EXP3, and thus has identical regret. Proofs of our 
theoretical results can be found in the Appendix. 

Remark. Running our variant of EXP 3 requires that each 
sensor know the number of sensors, n, in order to compute its 
activation probability. If each sensor v has only a reasonable 
estimate of n v of n, however, our algorithm still performs 
well. For example, it is possible to prove that if all of the 
sensors have the same estimate n v — cn for some constant 
c > 0, then the upper bound on expected regret, R(c), grows 
as R(c) w R(l) ■ max{c, 1/c}. The expected number of 
activations in this case increases by at most Q — l) 7. In 



general underestimating n leads to more activations, and 
underestimating or overestimating n can lead to more regret. 
This graceful degradation of performance with respect to the 
error in estimating n holds for all of our algorithms. 

4.2 The Distributed Online Greedy Algorithm 

We now use our single sensor selection algorithm to de- 
velop our main algorithm, the Distributed Online Greedy 
algorithm (DOG). It is based on the distributed implementa- 
tion of EXP3 using the PMS Protocol. Suppose we would 
like to select k sensors at each round t. Each sensor v main- 
tains k weights w Vi i, . . . ,w Vi k and normalizing constants 
The algorithm proceeds in k stages, synchro- 
nized using the common clock. In stage i, a single sensor is 
selected using the PMS Protocol applied to the distribution 
(l—-f)w v ^/Z v ^+j/n. Suppose sensors S = {vi, . . . , v^i} 
have been selected in stages 1 through i — 1. The sensor v 
selected at stage i then computes its local rewards ir v j using 
the utility function f t (S U {vi}) - f t (S). It then computes 
its new weight 

and broadcasts the difference between its new and old weights 
A V) j = w' v i—Wvj. All sensors then update their i th normaliz- 
ers using Z v j <— Z v j + A Vyi . Fig. 1 presents the pseudo-code 
of the DOG algorithm. Thus given Theorem 12 of [27] we 
have the following result about the DOG algorithm: 

THEOREM 3. The DOG algorithm selects, at each round 
t a set S t C Vofk sensors such that 



1 



E 



In expectation, only O(k) messages are exchanged each 
round. 

5. THE STAR NETWORK MODEL 

In some applications, the assumption that sensors can broad- 
cast messages to all sensors may be unrealistic. Furthermore, 
in some applications sensors may not be able to compute the 
marginal benefits f t (S U {s}) — ft(S) (since this calculation 
may be computationally complex). In this section, we an- 
alyze LAZYDOG, a variant of our DOG algorithm, which 
replace the above assumptions by the assumption that there is 
a dedicated base station 4 available which computes utilities 
and which can send non-broadcast messages to individual 
sensors. 

We make the following assumptions: 



4 Though the existence of such a base station means the protocol is 
not completely distributed, it is realistic in sensor network applica- 
tions where the sensor data needs to be accumulated somewhere for 
analysis. 



1 . Every sensor stores its probability mass p v with it, and 
can only send messages to and receive messages from 
the base station. 

2. The base station is able, after receiving messages from 
a set S of sensors, to compute the utility ft (S) and send 
this utility back to the active sensors. 

These conditions arise, for example, when cell phones in 
participatory sensor networks can contact the base station, but 
due to privacy constraints cannot directly call other phones. 
We do not assume that the base station has access to all 
weights of the sensors - we will only require the base sta- 
tion to have 0(k + logn) memory. In the fully distributed 
algorithm DOG that relies on broadcasts, it is easy for the 
sensors to maintain their normalizers Z Vi i, since they receive 
information about rewards from all selected sensors. The 
key challenge when removing the broadcast assumption is to 
maintain the normalizers in an appropriate manner. 

5.1 Lazy renormalization & Distributed EXP3 

EXP3 (and all MAB with no-regret guarantees against 
arbitrary reward functions) must maintain a distribution over 
actions, and update this distribution in response to feedback 
about the environment. In EXP3, each sensor v requires 
only w v (t) and a normalizer Z(t) := J2 V ' w v' W t0 compute 
p v (t) 5 . The former changes only when v is selected. In the 
broadcast model the latter can simply be broadcast at the end 
of each round. In the star network model (or, more generally 
in multi-hop models), standard flooding echo aggregation 
techniques could be used to compute and distribute the new 
normalizer, though with high communication cost. We show 
that a lazy renormalization scheme can significantly reduce 
the amount of communication needed by a distributed bandit 
algorithm without altering its regret bounds whatsoever. Thus 
our lazy scheme is complementary to standard aggregation 
techniques. 

Our lazy renormalization scheme for EXP 3 works as fol- 
lows. Each sensor v maintains its weight w v (t) and an esti- 
mate Z v (t) for Z(t) := Y, v < wv(t), Initially, tu„(0) = 1 and 
Z v (0) = n for all v. The central server stores Z(t). Let 

p{x,y) := (1-7)- + 
y n 

Each sensor then proceeds to activate as in the sampling 
procedure of Sec. 4.1 as if its probability mass in round 
t were q v = p(w v (t), Z v (t)) instead of its true value of 
p(w v (t), Z(t)). A single sensor is selected by the server 
with respect to the true value Z(t), resulting in a selection 
from the desired distribution. Moreover, v's estimate Z v {t) 
is only updated on rounds when it communicates with the 



5 We let x(t) denote the value of variable x at the start of round t, to 
ease analysis. We do not actually need to store the historical values 
of the variables over multiple time steps. 



Algorithm: Distributed Online Greedy (DOG) (described in the broadcast model) 

Input: fceN.a set V, and a, 7, i] e M>o- Reasonable defaults are any a G [1, In | V|], and 7 = 77 = 

mm (l, (|V| In l^l/g) 172 ), where g is a guess for the maximum cumulative reward of any single sensor [2]. 
Initialize io V)i «- 1 and <- \V\ for all u e V, i e [fc]. Let p(ar, y) := (1 — 7)^ + jyj ■ 

for each round t = 1, 2, 3, ... do 

Initialize S„ )t <— for each w in parallel, 
for each stage i e [k] do 

for each sensor v in parallel do 
repeat 

Sample X v ~ Poisson(a • p(w v j, Z v>i )). 
if (X„ > 1) then 

Broadcast (sampled X v ,id(v)); Receive messages from sensors S. (Include v € S for convenience), 
if id(v) = mm v > e s id(u') then 

Select exactly one element v it from S such that each v' is selected with probability X v > / J2 u es ^ 

Broadcast (select id(i> it )). 
Receive message (select id(v it )). 
if id(u) = id(v lt ) then 

Observe /((S^t + v); it <- f t (S Vit + v) - ft(S v , t ); A v <- w Vii (exp{r] ■ n/p(w Vii , Z Vfi )} - 1); 
Z v> i <— Z Vt i + A v ; w v <— w v + A v ; Broadcast (weight update A„, id(u)). 
if receive message (weight update A, id(v it )) then S Vtt ^— S Vi t U {v it }; Z v ^ Z v<i + A; 
until v receives a message of type (select id) ; 

Output: At the end of each round t each sensor has an identical local copy S V) f of the selected set S t . 

Figure 1: The Distributed Online Greedy Algorithm 



server under these circumstances. This allows the estimated 
probabilities of all of the sensors to sum to more than one, but 
has the benefit of significantly reducing the communication 
cost in the star network model under certain assumptions. 
We call the result Distributed EXP '3, give its pseudocode for 
round t in Fig. 2. 

Since the sensors underestimate their normalizers, they 
may activate more frequently than in the broadcast model. 
Fortunately, the amount of "overactivation" remains bounded. 
We prove Theorem 4 and Corollary 5 in Appendix B. 

THEOREM 4. The number of sensor activations in any 
round of the Distributed EXP3 algorithm is at most a + (e — 1) 
in expectation and 0{a + log n) with high probability, and 
the number of messages is at most twice the number of acti- 
vations. 

Unfortunately, there is still an e~ a probability of nothing 
being selected. To address this, we can set a = c In n for 
some c > 1, and if nothing is selected, transmit a message to 
each of the n sensors to rerun the protocol. 

COROLLARY 5. There is a distributed implementation of 
EXP3 that always selects a sensor in each round, has the 
same regret bounds as standard EXP3, ensures that the num- 
ber of sensor activations in any round is at most In n + 0(1) 
in expectation or O (log n) with high probability, and in which 
the number of messages is at most twice the number of acti- 
vations. 



5.2 LazyDOG 

Once we have the distributed EXP3 variant described 
above, we can use it for the bandit subroutines in the OG UNIT 
algorithm (cf. Sec. 3.2). We call the result the LAZYDOG 
algorithm, due to its use of lazy renormalizafion. The lazy 
distributed EXP3 still samples sensors from the same distri- 
bution as the regular distributed EXP3, so LAZYDOG has 
precisely the same performance guarantees with respect to 
St ft(St) as DOG. It works in the star network communica- 
tion model, and requires few messages or sensor activations. 
Corollary 5 immediately implies the following result. 

COROLLARY 6. The number of sensors that activate each 
round in LAZYDOG is at most k In n + O(k) in expectation 
andO(k log n) with high probability, the number of messages 
is at most twice the number of activations, and the (1 — 1/ e)- 
regret of LAZYDOG is the same as DOG. 

If we are not concerned about the exact number of sensors 
selected in each round, but only want to ensure roughly k 
sensors are picked in expectation, then we can reduce the 
number of sensor activations and messages to 0(k), by run- 
ning LAZYDOG with k' := \k/(l - e~")] stages for some 
constant a, and allowing each stage to run the Poisson Multi- 
nomial Sampling Protocol with lazy renormalization without 
rerunning it if nothing is selected. This is of course optimal 
up to constants, as we must send at least one message per 
selected sensor. 



Algorithm: Distributed EXP3 (executing on round t) 
Input: Parameters a, rj, 7 € M>o, sensor set V. 

Let p(x,y) := (1- 7 )| + ^. 
Sensors: 

foreach sensor v in parallel do 

Sample r v uniformly at random from [0,1]. 
if (r„ > 1 — a ■ p(w v (t), Z v (t)) then 
Send (r v ,w v (t)) to the server. 
Receive message (Z, w) from server. 

Z v (t + 1) «- Z; W v (t + 1) «- W. 

else Z v (t + 1) <- Z„(t); tu„(t + 1) <- 
Server: 

Receive messages from a set 5 of sensors. 

if S = then Select nothing and wait for next round. 

else foreach sensor v € 5 do 

Ft, 4- min {a; : Pr [X < x] > r„}, where 

X - Poisson(a • p(w v (t), Z(t))). 

Select v with probability 5^/ X^'es 

Observe the payoff ir for the selected sensor v*; 

w v , (t + l)<- w v * (t) ■ exp {r)n/p(w v * (t), Z(t))}; 

Z(t + 1) <- Z(t) + tu„. (t + 1) - w v , (t); 

for each weS\ti*do w„(t + 1) «- tu„(i); 

for each ve S do Send (Z(t+1), w v (t+l)) to v. 

Figure 2: Distributed EXP3: the PMS Protocol(a) 
with lazy renormalization, applied to EXP3 



THEOREM 7. The variant of LAZYDOG that runs the 
Poisson Multinomial Sampling Protocol (a) with lazy renor- 
malization for k' := \k/(l — e~ a )~\ stages, but does not 
rerun it if nothing is selected in a given stage, has the follow- 
ing guarantees: (1) the number of sensors that activate each 
round in LAZYDOG is at most k'(a + e — 1) in expectation 
and 0{ak\ogn) with high probability, (2) the number of 
messages is at most twice the number of activations, (3) the 
expected number of sensors selected in each round is at most 
k' and (4) its (1 — l/e)-regret is at most k'/k times that of 
DOG. 

We defer the proof to Appendix C. 

6. OBSERVATION-DEPENDENT SAMPLING 

Theorem 3 states that DOG is guaranteed to do nearly 
as well as the offline greedy algorithm run on an instance 
with objective function /s := J2t ft- Thus the reward of 
DOG is asymptotically near-optimal on average. In many 
applications, however, we would like to perform well on 
rounds with "atypical" objective functions. For example, in 
an outbreak detection application as we discuss in Sec. 7, we 
would like to get very good data on rounds with significant 
events, even if the nearest sensors typically report "boring" 
readings that contribute very little to the objective function. 



For now, suppose that we are only running a single MAB 
instance to select a single sensor in each round. If we have 
access to a black-box for evaluating f t on round t, then we 
can perform well on atypical rounds at the cost of some 
additional communication by having each sensor v take a 
local reading of its environment and estimate its payoff n = 
ft({v}) if selected. This value, which serves as a measure of 
how interesting its input is, can then be used to decide whether 
to boost v's probability for reporting its sensor reading to the 
server. In the simplest case, we can imagine that each v has a 
threshold t v such that v activates with probability 1 if n > t v , 
and with its normal probability otherwise. In the case where 
we select k > 1 sensors in each round, each sensor can have 
a threshold for each of the k stages, where in each stage it 
computes 7f = f t (S U {v}) — f t {S) where S is the set of 
currently selected sensors. Since the activation probability 
only goes up, we can retain the performance guarantees of 
DOG if we are careful to adjust the feedback properly. 

Ideally, we wish that the sensors learn what their thresholds 
t v should be. We treat the selection of r v in each round 
as an online decision problem that each v must play. We 
construct a particular game that the sensors play, where the 
strategies are the thresholds (suitably discretized), there is an 
activation cost c v that v pays if W v > t v , and the payoffs are 
defined as follows: Let ir v = ft (S Li {v}) — ft(S) be the 
marginal benefit of selecting v given that sensor set S has 
already been selected. Let A be the set of sensors that activate 
in the current iteration of the game, and let max (it(a\v)) '■— 
max (tt v i : v' e A \ {«}). The particular reward function tp v 
we choose for each sensor v for each iteration of the game is 

, , . _ J c v — max (ir v — max (tt(a\v)) > 0) if 7f < r 
]_ max (ir v — max (it(a\v)) 1 0) — c v if n > r 

based on empirical performance. Thus, if a sensor activates 
(n > r), its payoff is the improvement over the best payoff 
tt v i among all sensors v' E A minus its activation cost. In 
case multiple sensors activate, the highest reward is retained. 

In the broadcast model where each sensor can compute its 
marginal benefit, we can use any standard no-regret algorithm 
for combining expert advice, such as Randomized Weighted 
Majority (WMR) [21], to play this game and obtain no regret 
guarantees 6 for selecting t v . In our context a sensor using 
WMR simply maintains weights wfa) — exp (77 • V'totai(Tj)) 
for each possible threshold n, where 77 > is a learning 
parameter, and ^totai(rj) is the total cumulative reward for 
playing in every round so far. On each step each threshold 
is picked with probability proportional to its weight. In the 
more restricted star network model, we can use a modification 
of WMR that feeds back unbiased estimates for ipt(n), the 

6 We leave it as an open problem to determine if the outcome is close 
to optimal when all sensors play low regret strategies (i.e., is the 
price of total anarchy [4] small in any variant of this game with a 
reasonable way of splitting the value from the information?) 



payoff to the sensor for using a threshold of Tj in round t, 
and thus obtains reasonably good estimates of V>totai(7"j) after 
many rounds. We give pseudocode in Fig. 3. In it, we assume 
that an activated sensor can compute the reward of playing 
any threshold. 



We incorporate these ideas into the DOG algorithm, to 
obtain what we call the Observation-Dependent Distributed 
Online Greedy algorithm (OD-DOG). In the extreme case 
that c v = for all v the sensors will soon set their thresholds 
so low that each sensor activates in each round. In this case 
OD-DOG will exactly simulate the offline greedy algorithm 
run on each round. In other words, if we let G(f) be the 
result of running the offline greedy algorithm on the problem 

argmax{/(5) : S C V, \S\ < k} 

then OD-DOG will obtain a value of J2t ft{G{ft))\ in con- 
trast, DOG gets roughly J^t /t( G Et ft)), which may be 
significantly smaller. Note that Feige's result [12] implies that 
the former value is the best we can hope for from efficient 
algorithms (assuming P ^ NP). Of course, querying each 
sensor in each round is impractical when querying sensors is 
expensive. In the other extreme case where c v — oo for all v, 
OD-DOG will simulate DOG after a brief learning phase. In 
general, by adjusting the activation costs c v we can smoothly 
trade off the cost of sensor communication with the value of 
the resulting data. 

7. EXPERIMENTS 

In this section, we evaluate our DOG algorithm on several 
real-world sensing problems. 

7.1 Data sets 

Temperature data. In our first data set, we analyze temper- 
ature measurements from the network of 46 sensors deployed 



at Intel Research Berkeley. Our training data consisted of 
samples collected at 30 second intervals on 3 consecutive 
days (starting Feb. 28th 2004), the testing data consisted of 
the corresponding samples on the two following days. The 
objective functions used for this application are based on the 
expected reduction in mean squared prediction error /emse, 
as introduced in Sec. 2. 

Precipitation data. Our second data set consists of precipi- 
tation data collected during the years 1949 - 1994 in the states 
of Washington and Oregon [30]. Overall 167 regions of equal 
area, approximately 50 km apart, reported the daily precipita- 
tion. To ensure the data could be reasonably modeled using 
a Gaussian process we applied preprocessing as described 
in [19]. As objective functions we again use the expected 
reduction in mean squared prediction error /emse- 

Water network monitoring. Our third data set is based on 
the application of monitoring for outbreak detection. Con- 
sider a city water distribution network for delivering drinking 
water to households. Accidental or malicious intrusions can 
cause contaminants to spread over the network, and we want 
to install sensors to detect these contaminations as quickly 
as possible. In August 2006, the Battle of Water Sensor Net- 
works (BWSN) [11] was organized as an international chal- 
lenge to find the best sensor placements for a real metropolitan 
water distribution network, consisting of 12,527 nodes. In 
this challenge, a set of intrusion scenarios is specified, and 
for each scenario a realistic simulator provided by the EPA 
is used to simulate the spread of the contaminant for a 48 
hour period. An intrusion is considered detected when one 
selected node shows positive contaminant concentration. The 
goal of BWSN was to minimize impact measures, such as 
the expected population affected, which is calculated using 
a realistic disease model. For a security-critical sensing task 
such as protecting drinking water from contamination, it is 
important to develop sensor selection schemes that maximize 
detection performance even in adversarial environments (i.e., 
where an adversary picks the contamination strategy know- 
ing our network deployment and selection algorithm). The 
algorithms developed in this paper apply to such adversar- 
ial settings. We reproduce the experimental setup detailed 
in [18]. For each contamination event i, we define a sepa- 
rate submodular objective function fi(S) that measures the 
expected population protected when detecting the contami- 
nation from sensors S. In [18], Krause et al. showed that the 
functions fi (A) are monotone submodular functions. 

7.2 Convergence experiments 

In our first set of experiments, we analyzed the conver- 
gence of our DOG algorithm. For both the temperature [T] 
and precipitation [R] data sets, we first run the offline greedy 
algorithm using the ,/emse objective function to pick k = 5 



Algorithm: Modified WMR (star network setting) 

Input: parameter rj > 0, threshold set {t^ : i <E [to]} 

Initialize wfa) <— 1 for all i e [to]. 

for each round t = 1, 2, ... do 

Select n with probability w(ri) / Y^jLi w ( T j)- 
if sensor activates then 

Let ip(ri) be the reward for playing in this 
round of the game. Let qfa) be the total 
probability of activation conditioned on Tj being 
selected (including the activation probability that 
does not depend on local observations.) 
for each threshold r, do 

w(n) «- w(Ti)cxp {■ni>{T i )/q{T i )). 

Figure 3: Selecting activation thresholds for a sensor 



sensors. We compare its performance to the DOG algorithm, 
where we feed back the same objective function at every 
round. We use an exploration probability 7 = 0.01 and a 
learning rate inversely proportional to the maximum achiev- 
able reward /emse(^)- Fig. 4(a) presents the results for 
the temperature data set. Note that even after only a small 
number of rounds (w 100), the algorithm obtains 95% of the 
performance of the offline algorithm. After about 13,000 iter- 
ations, the algorithm obtains 99% of the offline performance, 
which is the best that can be expected with a .01 exploration 
probability. Fig. 4(b) show the same experiment on the pre- 
cipitation data set. In this more complex problem, after 100 
iterations, 76% of the offline performance is obtained, which 
increases to 87% after 500,000 iterations. 

7.3 Observation dependent activation 

We also experimentally evaluate our OD-DOG algorithm 
with observation specific sensor activations. We choose dif- 
ferent values for the activation cost c v , which we vary as 
multiples of the total achievable reward. The activation cost 
c v lets us smoothly trade off the average number of sensors 
activating each round and the average obtained reward. The 
resulting activation strategies are used to select a subset of 
size k — 10 from a collection of 12,527 sensors. Fig. 4(c) 
presents rates of convergence using the OD-DOG algorithm 
under a fixed objective function which considers all contami- 
nation events. In Fig. 4(d), convergence rates are presented 
under a varying objective function, which selects a differ- 
ent contamination event on each round. For low activation 
costs, the performance quickly converges to or exceeds the 
performance of the offline solution. Even under the lowest ac- 
tivation costs in our experiments, the average number of extra 
activations per stage in the OD-DOG algorithm is at most 5. 
These results indicate that observation specific activation can 
lead to drastically improved performance at small additional 
activation cost. 

8. RELATED WORK 

Sensor Selection. The problem of deciding when to selec- 
tively turn on sensors in sensor networks in order to conserve 
power was first discussed by [26] and [32]. Many approaches 
for optimizing sensor placements and selection assume that 
sensors have a fixed region [15, 14, 3]. These regions are 
usually convex or even circular. Further, it is assumed that ev- 
erything within a region can be perfectly observed, and every- 
thing outside cannot be measured by the sensors. For complex 
applications such as environmental monitoring, these assump- 
tions are unrealistic, and the direct optimization of prediction 
accuracy is desired. The problem of selecting observations 
for monitoring spatial phenomena has been investigated ex- 
tensively in geostatistics [8], and more generally (Bayesian) 
experimental design [6]. Several approaches have been pro- 



posed to activate sensors in order to minimize uncertainty 
[32] or prediction error [10]. However, these approaches do 
not have performance guarantees. Submodularity has been 
used to analyze algorithms for placing [19] or selecting [31] 
a fixed set of sensors. These approaches however assume that 
the model is known in advance. 

Submodular optimization. The problem of centralized 
maximization of a submodular function has been studied by 
[23], who proved that the greedy algorithm gives a factor 
(1 — 1/e) approximation. Several algorithms have since been 
developed for maximizing submodular functions under more 
complex constraints (see [29] for an overview). Streeter and 
Golovin developed an algorithm for online optimization of 
submodular functions, which we build on in this paper [28]. 

9. CONCLUSIONS 

In this paper, we considered the problem of repeatedly se- 
lecting subsets St from a large set of deployed sensors, in 
order to maximize a sequence of submodular utility functions 
/1, . . . , fr. We developed an efficient Distributed Online 
Greedy algorithm DOG, and proved it suffers no (1 — 1/e)- 
regret, essentially the best possible performance obtainable 
unless P = NP. Our algorithm is fully distributed, requiring 
only a small number of messages to be exchanged at each 
round with high probability. We analyze our algorithm both 
in the broadcast model, and in the star network model, where 
a separate base station is responsible for computing utilities 
of selected sets of sensors. Our LAZYDOG algorithm for 
the latter model uses lazy renormalization in order to reduce 
the number of messages required from 0(n) to O(fclogn), 
and the server memory required from 9(n) to 0(k + log n), 
where k is the desired number of sensors to be selected. In 
addition, we developed OD-DOG, an extension of DOG that 
allows observation-dependent sensor selection. We empiri- 
cally demonstrate the effectiveness of our algorithms on three 
real-world sensing tasks, demonstrating how our DOG algo- 
rithm's performance converges towards the performance of a 
clairvoyant offline greedy algorithm. In addition, our results 
with the OD-DOG algorithm indicate that a small number of 
extra sensor activations can lead to drastically improved con- 
vergence. We believe that our results provide an interesting 
step towards a principled study of distributed active learning 
and information gathering. 
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Figure 4: Experimental results on [T] Temperature data, [R] precipitation data and [W] water distribution network data. 
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APPENDIX 

A. RESULTS IN THE BROADCAST MODEL 

Proof of Theorem 2. To prove the regret bounds, note 
that in every round the distribution over sensor selections in 
the variant of EXP3 we describe (that uses the distributed 
multinomial sampling scheme and repeatedly reruns the pro- 
tocol in order to always select some sensor in each round) 
is precisely the same as the original EXP3. Thus the regret 
bounds for EXP3 [2] carry over unchanged. We next bound 
the number of broadcasts. Fix a round, and let S set of sensors 
that activate in that round. The total number of broadcasts 
is then l^l + 2; using their calibrated clocks, each sensor 
(re)samples X v <~ Poisson(ap t) ) and activates if X v > 1. 
If no sensors activate before a specified timeout period, the 
default behavior is to rerun the sampling step. Eventually 
1 5*1 > 1 sensors activate in the same period. A distinguished 
sensor in S then determines the selected sensor v, broad- 
casts id(w), and v broadcasts its observed reward. We prove 
E [|<5|] < - e~ a ) in Proposition 8. When a = 1, this 
gives us the claimed bound on the number of broadcasts. □ 

PROPOSITION 8. Rerunning the Poisson Multinomial Sam- 
pling Protocol until an element is selected results in at most 
a/(l — e~ a ) elements being activated in expectation. More- 
over, this value is tight. 

Proof. Let X v <~ Bernoulli(a • p v ) be the indicator ran- 
dom variable for the activation of v, and let X := J2 V 
The expected number of sensor activations is then 

E [X | X > 1] = E [X] /Pr [X>1]. 

In the limit as max„ p v tends to zero, X converges to a Pois- 
son random variable with mean a. In this case, P ^x>i] = 
a/(l — e~ a ) To see that this is an upper bound, consider 
an arbitrary distribution p on the sensors, and fix some v 
with x := p v > 0. We claim that replacing v with two sen- 
sors Vi and V2 with positive probability mass x\ and xi with 
x = Xi + x 2 can only serve to increase the expected num- 
ber of sensor activations, because E [X] is unchanged, and 
Pr [X > 1] decreases. The latter is true essentially because 
Pr [3i e {1, 2} : v t activates] = 1 - (1 - xi)(l - x 2 ) = 
x — X1X2 < x. To complete the proof, notice that repeating 
this process with v — argmax^) and Xi — x/2 ensures 
X converges to a Poisson variable with mean a, while only 
increasing E[X | X > 1]. □ 

B. RESULTS IN THE STAR NETWORK 
MODEL 

In this section we will prove that lazy renormalization sam- 
ples sensors from a proper scaled distribution (1 — e~ a )p v 
where p v is the input distribution. We then bound the com- 
munication overhead of using lazy renormalization for any 



MAB algorithm satisfying certain assumptions enumerated 
below, and then show how these bounds apply to EXP3. 

PROPOSITION 9. The lazy renormalization scheme of 'Sec. 5.1, 
described in pseudocode in Fig. 2, samples v with probabil- 
ity (1 — e~ a )p v , where p v = p(w v (t), Z(tj) is the desired 
probability mass for v. 

PROOF. Lazy renormalization selects each sensor v with 
probability (1 — e~ a )p v , because of the way the random bits 
r v are shared in order to implement a coupled distribution for 
sensor activation and selection. Note that it would be suffi- 
cient to run the Poisson Multinomial Sampling Protocol on 
the correct (possibly oversampled) probabilities, ap v , since 
then Prop. 1 ensures that each v is selected with probability 
(1 — e~ a )p v . The difficulty is that v does not have access 
to the correct normalizer Z(t), but only its estimate (lower 
bound) for it, Z v (t). To overcome this difficulty, we define 
a joint probability distribution over two random variables 
(X V ,Y V ), where 

1 ifR> 1- a- p(w v (t),Z v (t)) 
otherwise 



X v — X V (R) 



Y v = Y V (R) := min jb : ^ — — > Rj 

and A := a ■ p(w v (t), Z(t)), and R is sampled uniformly 
at random from [0, 1]. Now, note that Y v is distributed as 
Poisson(A). Also note that Y v > 1 implies X v > 1, because 
Y v > 1 implies R > e~ x and 

e~ x > 1 - A > 1 - a ■ p(w v (t), Z v (t)) 

since 1 + x < e x for all x e R, and p(w v (t), Z v (t)) > 
p(w v (t), Z(t)) due to fact that Z v (t) < Z(t). It follows that 
we can use the event X v > 1 as a conservative indicator 
that v should activate. In this case, it will send its sampled 
value for R, namely r v , and its weight w v (t) to the server. 
The server knows Z(t), and then can use r v and w v (t) to 
compute Y v (r v ), the sample from Poisson(A) that v would 
have drawn had it known Z(t). The resulting distribution on 
selected sensors is thus exactly the same as in the Poisson 
Multinomial Sampling Protocol without lazy renormalization. 
Invoking Prop. 1 thus completes the proof. □ 

We now describe the assumptions that are sufficient to 
ensure lazy renormalization has low communication costs. 
Fix an action v and a multiarmed bandit algorithm. Let 
p v {t) € [0, 1] be the random variable denoting the proba- 
bility the algorithm assigns to v on round t. The value of 
p v (t) depends on the random choices made by the algorithm 
and the payoffs observed by it on previous rounds. We assume 
the following about each p v (t). 

1. p v (t) can be computed from local information v pos- 
sesses and global information the server has. 



2. There exists an e > such that p v (t) > e for all t. 

3. p v (t) < p v (t + 1) implies v was selected in round t. 

4. There exists e > such thatp„(i + 1) >p v (t)/(l + e) 
for all t. 

Many MAB algorithms satisfy these conditions. For example, 
all MAB algorithms with non-trivial no-regret guarantees 
against adversarial payoff functions must continually explore 
all their options, which effectively mandates p v (t) > e for 
some e > 0. In Lemma 1 we prove that EXP3 does so with 
e = 7/71 and e = (e — 1)-, assuming payoffs in [0, 1]. In 
this case, Theorem 10 bounds the expected increase in sensor 
communications due to lazy renormalization by a factor of 

1 + 5=1. 

a 

THEOREM 10. Fix a multiarmed bandit instance with pos- 
sibly adversarial payoff functions, and a MAB algorithm sat- 
isfying the above assumptions on its distribution over actions 
{p v (t)} ve y Let q v (t) be the corresponding random esti- 
mates for p v (t) maintained under lazy renormalization with 
oversampling parameter a. Then for all v and t, 



®[q v (t)/ Pv (t)} < 1 + 



ae 



and 



E [«„(*)] < 1 



EM*)]. 



PROOF. Fix v, and let p(t) := p v (t), q(t) := q v (t). We 
begin by bounding Pr [q(t) > Xp(t)] for A > 1. Let t be 
the most recent round in which q(to) = p(to)- We assume 
q(0) = p(0), so to exists. Then q(t) = p(to) > Xp(t) implies 
p(t )/p(t) > A. By assumption p(t')/p(t' + 1) < (l + e)for 
all t', so p(t )/p{t) < (1 + ey- to . Thus A < (1 + e)*~ to and 
t-t > ln(A)/ ln(l + e). Define t(\) := ln(A) / ln(l + e). 

By definition of to, there were no activations under lazy 
renormalization in rounds to through t — 1 inclusive, which 
occurs with probability J|j, = 1 tQ (l — Q!(j(t')) = (1 — aq{t)) l ^ ta 
< (1 — ag(i))r*( A )l , where a is the oversampling parame- 
ter in the protocol. We now bound E [q(t)/p(t) \ q(t)]. Re- 
call that E [X] = J^1 Q Pr [X > x] dx for any non-negative 
random variable X. It will also be convenient to define 
co := ln(l/(l — aq(t)))/ ln(l + e) and assume for now that 
oj > 1. Conditioning on q(t), we see that 

E[q(t)/p(t)\q(t)] = JZ Pr[q(t)>Xp(t)]dX 

= l + ir =1 Pr[ q (t)>Xp(t)}d\ 

< l + f^il-aqWWdX 

= i + JZi xln(1 ~ aq{t))/ln{1+i)dX 

= 1 + J^A-^A 



= 1 + 



Ul-l 



Using In (izr^) > x f° r all x < 1 and ln(l + x) < x for 
all x > —1, we can show that ui > aq(t)/e so 1 + — ^ < 
aq(t)/(aq(t) — e). Thus, if ag(i) > e then w > 1 and we 
obtain E [q(t)/p(t) \ q(t)] < aq(t)/(aq(t) - e). 

If q(t) >> e, this gives a good bound. If g(t) is small, we 
rely on the assumption that p(t) > e for all t to get a trivial 
bound of q(t)/p(t) < q(t)/e. We thus conclude 

E [q(t)/p(t) | q(t)} < mm (aq(t)/{aq(t) - e), g(t)/e) . 

(B.l) 

Setting g(t) = (e/a + e) to maximize this quantity yields an 
unconditional bound of E [q(t)/p(t)] < 1 + e/ae. 

To bound E [q(t)] in terms of E \p(t)], note that for all q 

q/E [p(t) | g(t) = g ] < E [g(i)/p(t) I g(t) = q] 
< 1 + e/ae 

where the first line is by Jensen's inequality, and the second 
is by equation B.l. Thus q < (1 + e/ae) E \p(t) \ q(t) = q] 
for all q. Taking the expectation with respect to q then proves 
E [q v (t)} < (1 + ^) E \p v (t)] as claimed. □ 

LEMMA 1. EXP3 with n = 7/n satisfies the conditions 
of Theorem 10 with e = 7/n and e = (e — 1) -. 

Proof. The former equality is an easy observation. To 
prove the latter equality, fix a round t and a selected action 
v. Let «;„(£) be the weight of v in round t, and W^i) be 
the total weight of all actions in round t. Let ir be the pay- 
off to v in round t. Given the update rule w v (t + 1) = 

exp f ^ pl^'t) )' on ^ tne probabilities of the other ac- 
tions will be decreased. It is not hard to see that they will be 
decreased by a multiplicative factor of at most W(t)/W(t + 
1), no matter what the learning parameter 7 is. By the update 
rule, 



W{t + 1) = W{t)+w v (t) exp 



7 7T 



■Pv{t) 



1 



Letp := p„(t) and x :— ^ir. Dividing the above equation by 
W(t), we get 

W(t+1) 



W(t) 



1 +p(exp {x/p) - 1) 



(B.2) 



< l+p(x/p+(e-2)(x/p) 2 ) (B.3) 

< l + a; + (e-2)x 2 /p (B.4) 

where in the second line we have used e x < 1 + x + (e — 2)x 2 
for x e [0,1]. Note it < 1 implies x < 7/n < p, so 
^WW- - ! + (e-l)x < l + (e-l)^. It follows that setting 
e = (e — 1)^ is sufficient to ensure p v (t+ 1) > p t) (t)/(l + e) 
for all f . □ 

We now prove Theorem 4 and Corollary 5. 

Proof of Theorem 4.. We prove in Lemma 1 that EXP3 
satisfies the conditions of Theorem 10 with e = 7/n and 



e = (e — 1)^. Thus by Theorem 10 



E 



< (l + (e-l)/a)E 
= (l + (e-l)/a) 



because = L Each sensor u activates with proba- 

bility ag t ,(t), so the expected number of activations is 



E 



<a(l + (e-l)/a). 



That proves the claimed bounds in expectation. To prove 
bounds with high probability, note that a sensor activates 
with probability aq v (t) in round t, where q v (t) is a random 
variable. Fix t. Let [£] denote the indicator variable for the 
event £, i.e., [£] = 1 if £ occurs, and [£] = otherwise. 
Then we can write [v activates in round t] = [aq v (t) > R], 
where R is sampled uniformly at random from [0, 1] and R 
is independent of q v (t). Then if fx is the probability density 
functions of R we can write 



Pr [R<aq v (t)] 



-i: 



Pr [aq v {t) >R\R = r]f R (r)dr 



Pr [aq v (t)>r]f R (r)dr 



Pr [aq v {t) > r] dr 



= E[aq v (t)] 

Thus the number of sensor activations is a sum of | V \ binary 
random variables with cumulative mean [i := J2 V ^ [ a '?t'(*)]- 
We have already bounded this mean as,u < a + (e — 1). From 
here a simple application of a Chernoff-Hoeffding bound 
suffices to prove that with high probability this sum is at most 
0(a + logn). Let A be the number of sensor activations. 
Then, e.g., Theorem 5 of [7] immediately implies 



Pr [A > fi(l + 8)} < exp 




For 5 > 1, this yields Pr [A > p(l + S)} < exp (- 



8 l 



Setting S 



8c In n 



3 ^ ensures this probability is at most 
n~ c , hence Pr [A > 2/i + |clnn] < n~ c . Noting that fi < 
a + (e — 1) completes the high probability bound on the 
number of activations. 

As for the number of messages, note that each message 
involves a sensor as sender or receiver, and by inspection the 
protocol only involves two messages per activated node. □ 

Proof of Corollary 5.. Use the distributed EXP3 pro- 
tocol with lazy renormalization with a = Inn. We have 
already established that the probability of nothing being se- 
lected is e~ Q or 1/n in this case. If nothing is selected, 



send out n messages, one to each sensor, to rerun the proto- 
col. The expected number of messages sent to initiate addi- 
tional runs of the protocol is J2^Li nx/n x = (1 — l/n)~ 2 = 
l + 0(\/n). Let X be the number of sensor activations. As in 
the proof of Proposition 8, if Y is the expected number of sen- 
sor activations without rerunning the protocol when nothing 
is selected, then E [X] = E [Y] /Pr [Y > 1]. By Theorem 4 
E [Y] < a (1 + (e - l)/a). Since Pr [Y > 1] = 1 - e~ a , 
we conclude 



E[X] < \nn + (e- 1) +0 



In i 



The with-high-probability bounds on the number of sensor 
activations are proved as in the proof of Corollary 5. 

As for the number of messages, note that other than mes- 
sages sent to initiate additional runs of the protocol, there 
are only two messages per activated node. Finally, the regret 
bounds for distributed EXP3 are the same as standard EXP3 
because by design the two algorithms select sensors from 
exactly the same distribution in each round. Note that the 
distribution in any given round is a random object depending 
on the algorithm's choices in the previous rounds, however 
on each round the distribution on distributions is the same for 
both EXP3 variants, as can be readily proved by induction 
on the round number. □ 

C. ALGORITHM OG UNIT WITH FAULTY AC- 
TIONS 

In order to prove Theorem 7, we need a guarantee on the 
performance of OG UN1T if its elements are may fail to give 
any benefit. We provide this in the form of Theorem 1 1 . 

Suppose we run DOG with the Poisson Multinomial Sam- 
pling Protocol with lazy renormalization, and do not resample 
on stages where no sensor activates. Then with some probabil- 
ity during any given stage i € [k], no sensors activate and the 
server receives no information. Suppose that this probability 
is at most 6 in each stage. We have shown in section 4. 1 that 
S < e~ a where a is the oversampling parameter. We claim 
that we can compensate for this possibility by running DOG 
for fc/(l — 6) stages in each round rather than k, because of 
the following guarantee for OG UNIT . 

THEOREM 11. Fix finite set V, k € N, and a sequence 
of monotone submodular functions f\, . . . , ■ 2 V —> [0, 1]. 
Let OPT k = max Sc v,|s|<fc Y?t=i M s )- For all v e V let 
v 1 be a random element which is v with probability 1 — 5 V and 
is null 7 with probability 5 V . Let f' t {S') := E[f t (S)] where 
S the set obtained by including every element v' of S' in 
it independently with probability 5 V . Let S[, . . . , S' T be the 
sequence of random sets obtained from running OG UNIT with 



7 Here, a null element always contributes nothing in the way of utility, 
so that f t (S U {null}) = f t (S) for all t and S. 



actions V' := {V : v G V} and objective functions {f't} t=1 
and k' — k/(l — S) stages, where S = m&x v S v . Suppose 
the algorithms for each stage have expected regret at most r. 
Then 



E 



> i 



i 



OPT k 



k 



1-6 



r. 



Proof. It suffices to prove the analogous result in the 
offline case; the "meta-actions" analysis in [27] can then be 
used to complete the proof. So consider a set of elements V 
and the "faulty" versions V'. Fix a monotone submodular 
/ : 2 V — ¥ [0, 1] and define /' as above. Run the offline 
greedy algorithm on /' to try to find the best set of k' = 
elements in V'. Let g[ be the chosen element in stage i, and 
let G[ = {g' } ■ : 1 < j < i}. Let Gi denote the realization of 
G\ after sampling, so that G, C {g : g' e G-}. Let S* = 
argmax SC y | S | <fc (/(5)). We claim that for all i 



E [/(G- +1 ) - f(G'i) | Gi] >(l-S) 



f(S*) - f(Gj) 
k 



because 



f(S*) - f(Gi) < f(GiU S*) - f(Gi) 

< E (f(G l +v)-f(G l )) 

<k-max(f(Gi+v)-f(Gi)) 

V 

and m&x v , (E [f{G[ + v') - f(G' i ) \ Gi]) is at least equal to 
(1 — 8) max„ (/(Gj + v) — /(Gj)). Removing the condition- 
ing on Gi we get 

/(S*)-E[/(G{)] 



E[f(G' l+1 )-.f(G' t )] >(l-5) 



k 



Let = f(S*) - E [/(G'J]. The previous equation im- 
plies $(i + 1) < (l - ^). By induction < 

f(S*) (1 - Using 1 - x < e~ x we conclude that 

H\k/(1 - 5)1) < f(S*)/emdf'(G k ,) > (l - i) f(S*). □ 

We are now ready to prove Theorem 7. 

Proof Theorem 7. To bound the number of sensor acti- 
vations, we note there are k' := \k/ (1 — e~ Q )] rounds, and 
each round activates at most a + (e — 1) sensors in expectation 
and O(a + log n) sensors with high probability by Theorem 4 
(which proves these bounds in the higher communication case 
where we do rerun the PMS Protocol protocol if nothing is 
selected). This, and the fact that a/ (1 — e~ a ) = O(a) for 
a > yields the claimed activation bounds. It is an easy 
observation that the number of messages is at most twice 
the number of activations. Clearly, at most one sensor per 
stage is activated, so at most k' are activated over one round. 
Finally, the regret bound follows from Theorem 11, using 
S = e~ a . □ 
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ABSTRACT 

A key problem in sensor networks is to decide which sensors 
to query when, in order to obtain the most useful information 
(e.g., for performing accurate prediction), subject to con- 
straints (e.g., on power and bandwidth). In many applications 
the utility function is not known a priori, must be learned 
from data, and can even change over time. Furthermore for 
large sensor networks solving a centralized optimization prob- 
lem to select sensors is not feasible, and thus we seek a fully 
distributed solution. In this paper, we present Distributed 
Online Greedy (DOG), an efficient, distributed algorithm for 
repeatedly selecting sensors online, only receiving feedback 
about the utility of the selected sensors. We prove very strong 
theoretical no-regret guarantees that apply whenever the (un- 
known) utility function satisfies a natural diminishing returns 
property called submodularity . Our algorithm has extremely 
low communication requirements, and scales well to large 
sensor deployments. We extend DOG to allow observation- 
dependent sensor selection. We empirically demonstrate the 
effectiveness of our algorithm on several real-world sensing 
tasks. 

Categories and Subject Descriptors 

C.2.1 [Computer-Communication Networks]: Network Ar- 
chitecture and Design; G.3 [Probability and Statistics]: Ex- 
perimental Design; 1.2.6 [AI]: Learning 

General Terms 

Algorithms, Measurement 
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Sensor networks, approximation algorithms, distributed mul- 
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tiarmed bandit algorithms, submodular optimization 

I. INTRODUCTION 

A key challenge in deploying sensor networks for real- 
world applications such as environmental monitoring [?], 
building automation [?] and others is to decide when to acti- 
vate the sensors in order to obtain the most useful information 
from the network (e.g., accurate predictions at unobserved 
locations) and to minimize power consumption. This sensor 
selection problem has received considerable attention [?, ?, 
?], and algorithms with performance guarantees have been 
developed [?, ?]. However, many of the existing approaches 
make simplifying assumptions. Many approaches assume (1) 
that the sensors can perfectly observe a particular sensing 
region, and nothing outside the region [?]. This assumption 
does not allow us to model settings where multiple noisy 
sensors can help each other obtain better predictions. There 
are also approaches that base their notion of utility on more 
detailed models, such as improvement in prediction accuracy 
w.r.t. some statistical model [?] or detection performance [?]. 
However, most of these approaches make two crucial assump- 
tions: (2) The model, upon which the optimization is based, is 
known in advance (e.g., based on domain knowledge or data 
from a pilot deployment) and (3), a centralized optimization 
selects the sensors (i.e., some centralized processor selects the 
sensors which obtain highest utility w.r.t. the model). We are 
not aware of any approach that simultaneously addresses the 
three main challenges (1), (2) and (3) above and still provides 
theoretical guarantees. 

In this paper, we develop an efficient algorithm, called 
Distributed Online Greedy (DOG), which addresses these 
three central challenges. Prior work [?] has shown that many 
sensing tasks satisfy an intuitive diminishing returns property, 
submodularity, which states that activating a new sensor helps 
more if few sensors have been activated so far, and less if 
many sensors have already been activated. Our algorithm ap- 
plies to any setting where the true objective is submodular [?], 
thus capturing a variety of realistic sensor models. Secondly, 
our algorithm does not require the model to be specified in 
advance: it learns to optimize the objective function in an on- 



line manner. Lastly, the algorithm is distributed; the sensors 
decide whether to activate themselves based on local infor- 
mation. We analyze our algorithm in the no-regret model, 
proving convergence properties similar to the best bounds for 
any centralized solution. 

A bandit approach toward sensor selection. At the heart 
of our approach is a novel distributed algorithm for multi- 
armed bandit (MAB) problems. In the classical multiarmed 
bandit [?] setting, we picture a slot machine with multiple 
arms, where each arm generates a random payoff with un- 
known mean. Our goal is to devise a strategy for pulling 
arms to maximize the total reward accrued. The difference 
between the optimal arm's payoff and the obtained payoff 
is called the regret. Known algorithms can achieve average 
per-round regret of 0(y/n log n/y/T) where n is the number 
of arms, and T the number of rounds (see e.g. the survey of 
[?]). Suppose we would like to, at every time step, select k 
sensors. The sensor selection problem can then be cast as a 
multiarmed bandit problem, where there is one arm for each 
possible set of k sensors, and the payoff is the accrued utility 
for the selected set. Since the number of possible sets, and 
thus the number of arms, is exponentially large, the resulting 
regret bound is 0{n k / 2 \Jlogn/y/T), i.e., exponential in k. 
However, when the utility function is submodular, the payoffs 
of these arms are correlated. Recent results [?] show that this 
correlation due to submodularity can be exploited by reducing 
the n fc -armed bandit problem to k separate n-armed bandit 
problems, with only a bounded loss in performance. Existing 
bandit algorithms, such as the widely used EXP3 algorithm 
[?], are centralized in nature. Consequently, the key challenge 
in distributed online submodular sensing is how to devise a 
distributed bandit algorithm. In Sec. 4 and 5, we develop a 
distributed variant of EXP3 using novel algorithms to sample 
from and update a probability distribution in a distributed way. 
Roughly, we develop a scheme where each sensor maintains 
its own weight, and activates itself independently from all 
other sensors purely depending on this weight. 

Observation specific selection. A shortcoming of central- 
ized sensor selection is that the individual sensors' current 
measurements are not considered in the selection process. 
In many applications, obtaining sensor measurements is less 
costly than transmitting the measurements across the network. 
For example, cell phones used in participatory sensing [?] can 
inexpensively obtain measurements on a regular basis, but it 
is expensive to constantly communicate measurements over 
the network. In Sec. 6, we extend our distributed selection 
algorithm to activate sensors depending on their observations, 
and analyze the tradeoff between power consumption and the 
utility obtained under observation specific activation. 

Communication models. We analyze our algorithms under 



two models of communication cost: In the broadcast model, 
each sensor can broadcast a message to all other sensors at 
unit cost. In the star network model, messages can only be 
between a sensor and the base station, and each message has 
unit cost. In Sec. 4 we formulate and analyze a distributed 
algorithm for sensor selection under the simpler broadcast 
model. Then, in Sec. 5 we show how the algorithm can be 
extended to the star network model. 

Our main contributions. 

• Distributed EXP3, a novel distributed implementation 
of the classic multiarmed bandit algorithm. 

• Distributed Online Greedy (DOG) and LAZYDOG, 
novel algorithms for distributed online sensor selection, 
which apply to many settings, only requiring the utility 
function to be submodular. 

• OD-DOG, an extension of DOG to allow for observation- 
dependent selection. 

• We analyze our algorithm in the no-regret model and 
prove that it attains the optimal regret bounds attainable 
by any efficient centralized algorithm. 

• We evaluate our approach on several real-world sensing 
tasks including monitoring a 12,527 node network. 

Finally, while we do not consider multi-hop or general 
network topologies in this paper, we believe that the ideas 
behind our algorithms will likely prove valuable for sensor 
selection in those models as well. 

2. THE SENSOR SELECTION PROBLEM 

We now formalize the sensor selection problem. Suppose a 
network of sensors has been deployed at a set of locations V 
with the task of monitoring some phenomenon (e.g., tempera- 
ture in a building). Constraints on communication bandwidth 
or battery power typically require us to select a subset A of 
these sensors for activation, according to some utility func- 
tion. The activated sensors then send their data to a server 
(base station). We first review the traditional offline setting 
where the utility function is specified in advance, illustrating 
how submodularity allows us to obtain provably near-optimal 
selections. We then address the more challenging setting 
where the utility function must be learned from data in an 
online manner. 

2.1 The Offline Sensor Selection Problem 

A standard offline sensor selection algorithm chooses a 
set of sensors that maximizes a known sensing quality ob- 
jective function f(A), subject to some constraints, e.g., on 
the number of activated sensors. One possible choice for 
the sensing quality is based on prediction accuracy (we will 
discuss other possible choices later on). In many applications, 
measurements are correlated across space, which allows us to 
make predictions at the unobserved locations. For example, 



prior work [?] has considered the setting where a random 
variable X s is associated with every location s e V, and 
a joint probability distribution P{X V ) models the correla- 
tion between sensor values. Here, Xy = [Xi, . . . ,X n ] is 
the random vector over all measurements. If some measure- 
ments X A = x_4 are obtained at a subset of locations, then 
the conditional distribution P(Xy\ A I %A = x a) allows 
predictions at the unobserved locations, e.g., by predicting 
^[^VVA | X A = x a\- Furthermore, this conditional distribu- 
tion quantifies the uncertainty in the prediction: Intuitively, 
we would like to select sensors that minimize the predictive 
uncertainty. One way to quantify the predictive uncertainty is 
the mean squared prediction error, 

MSE(X v \ A \x A ) = ^ Yl mx,-E[X e |x^]) 2 |xa]. 

sev\A 

In general, the measurements x_4 that sensors A will make is 
not known in advance. Thus, we can base our optimization 

on the expected mean squared prediction error, 

EMSE(^l) = J dp(x A ) MSE{X V \ A | x A ). 

Equivalently, we can maximize the reduction in mean squared 
prediction error, 

/emse (-4) = EMSE(0) - EMSE(^). 

By definition, ,/emse(0) = 0, i.e., no sensors obtain no utility. 
Furthermore, /emse is monotonic: if A C B C V, then 
/emse (-4.) < /emse(B). i-e., adding more sensors always 
helps. That means, /emse is maximized by the set of all 
sensors V. However, in practice, we would like to only select 
a small set of, e.g., at most k sensors due to bandwidth and 
power constraints: 

A* = arg max /emse (*4) s.t. < k. 

A 

Unfortunately, this optimization problem is NP-hard, so we 
cannot expect to efficiently find the optimal solution. Fortu- 
nately, it can be shown [?] that in many settings 1 , the func- 
tion /emse satisfies an intuitive diminishing returns property 
called submodularity. A set function / : 2 V — > M is called 
submodular if, for all A C B C V and s e V \ B it holds that 
f{AU{s})-f{A) > f(BU{s})-f(B). Many other natural 
objective functions for sensor selection satisfy submodularity 
as well [?]. For example, the sensing region model where 
fnEG(-A) is the total area covered by all sensors A is sub- 
modular. The detection model where /det(-4) counts the 
expected number of targets detected by sensors A is submod- 
ular as well. 

A fundamental result of Nemhauser et al. [?] is that for 
monotone submodular functions, a simple greedy algorithm, 

'For Gaussian models and conditional suppressorfreeness [?] 



which starts with the empty set Ao — and iteratively adds 
the element 

s k = argmax f(A k -i U {s}); A k = A k -i U {s k } 

sev\A k -! 

which maximally improves the utility obtains a near-optimal 
solution: For the set A k it holds that 

f(A k )> (1-1/e) max . /VI), 

\A\<k 

i.e., the greedy solution obtains at least a constant fraction of 
(1 — 1/e) w 63% of the optimal value. 

One fundamental problem with this offline approach is that 
it requires the function / to be specified in advance, i.e., be- 
fore running the greedy algorithm. For the function /emse, 
this means that the probabilistic model P(X V ) needs to be 
known in advance. While for some applications some prior 
data, e.g., from pilot deployments, may be accessible, very 
often no such prior data is available. This leads to a "chicken- 
and-egg" problem, where sensors need to be activated to 
collect data in order to learn a model, but also the model is 
required to inform the sensor selection. This is akin to the 
"exploration-exploitation tradeoff in reinforcement learning 
[?], where an agent needs to decide whether to explore and 
gather information about effectiveness of an action, or to 
exploit, i.e., choose actions known to be effective. In the 
following, we devise an online monitoring scheme based on 
this analogy. 

2.2 The Online Sensor Selection Problem 

We now consider the more challenging problem where the 
objective function is not specified in advance, and needs to be 
learned during the monitoring task. We assume that we intend 
to monitor the environment for a number T of time steps 
(rounds). In each round t, a set St of sensors is selected, and 
these sensors transmit their measurements to a server (base 
station). The server then determines a sensing quality ft(St) 
quantifying the utility obtained from the resulting analysis. 
For example, if our goal is spatial prediction, the server would 
build a model based on the previously collected sensor data, 
pick a random sensor s, make prediction for the variable X s , 
and then compare the prediction fi s with the sensor reading 
x s . The error f t = a 2 — (p s — x s ) 2 is an unbiased estimate of 
the reduction in EMSE. In the following analysis, we will only 
assume that the objective functions f t are bounded (w.l.o.g., 
take values in [0, 1]), monotone, and submodular, and that we 
have some way of computing ft(S) for any subset of sensors 
S. Our goal is to maximize the total reward obtained by the 
system over T rounds, J2t=i ft(St)- 

We seek to develop a protocol for selecting the sets S t 
of sensors at each round, such that after a small number 
of rounds the average performance of our online algorithm 
converges to the same performance of the offline strategy 
(that knows the objective functions). We thus compare our 



protocol against all strategies that can select a fixed set of 
k sensors for use in all of the rounds; the best such strategy 
obtains reward maxg C y.|s|< fe J2t=i ft(S)- The difference 
between this quantity and what our protocol obtains is known 
as its regret, and an algorithm is said to be no-regret if its 
average regret tends to zero (or less) 2 as T — > oo. 

When k = 1, our problem is simply the well-studied mul- 
tiarmed bandit (MAB) problem, for which many no-regret 
algorithms are known [?]. For general k, because the average 
of several submodular functions remains submodular, we can 
apply the result of Nemhauser et al. [?] (cf., Sec. 2.1) to 
prove that a simple greedy algorithm obtains a (1 — 1/e) ap- 
proximation to the optimal offline solution. Feige [?] showed 
that this is optimal in the sense that obtaining a 
(1 — 1/e + e) approximation for any e > is NP-hard. These 
facts suggest that we cannot expect any efficient online algo- 
rithm to converge to a solution better than 
(1 - 1/e) max 5 cv:|s|<fc Ef=i ft{S)- We therefore define 
the (1 — l/e)-regret of a sequence of (possibly random) sets 
as 

Rt :=(l-l/e). max £/ t (S) - £E[/ t (S t )] 
- '- t=\ t=i 

where the expectation is taken over the distribution for each 
St. We say an online algorithm producing a sequence of sets 
has no-(l — l/e)-regret if lim supy^^ ^ < 0. 

3. CENTRALIZED ALGORITHM FOR ON- 
LINE SENSOR SELECTION 

Before developing the distributed algorithm for online sen- 
sor selection, we will first review a centralized algorithm 
which is guaranteed to achieve no (1 — l/e)-regret. In Sec. 
4 we will show how this centralized algorithm can be imple- 
mented efficiently in a distributed manner. This algorithm 
starts with the greedy algorithm for a known submodular 
function mentioned in Sec. 2.1, and adapts it to the online 
setting. Doing so requires an online algorithm for selecting 
a single sensor as a subroutine, and we review such an algo- 
rithm in Sec. 3.1 before discussing the centralized algorithm 
for selecting multiple sensors in Sec. 3.2. 

3.1 Centralized Online Single Sensor Selection 

Let us first consider the case where k = 1, i.e., we would 
like to select one sensor at each round. This simpler problem 
can be interpreted as an instance of the multiarmed bandit 
problem (as introduced in Sec. 2.2), where we have one arm 
for each possible sensor. In this case, the EXP3 algorithm [?] 
is a centralized solution for no-regret single sensor selection. 
EXP3 works as follows: It is parameterized by a learning 
rate r\, and an exploration probability 7. It maintains a set of 

2 Formally, if Rt is the total regret for the first T rounds, no-regret 
means limsupy^^ Rt/T < 0. 



weights w s , one for each arm (sensor) s, initialized to 1. At 
every round t, it will select each arm s with probability 



i.e., with probability 7 it explores, picking an arm uniformly 
at random, and with probability (1 — 7) it exploits, picking an 
arm s with probability proportional to its weight w s . Once an 
arm s has been selected, a feedback r = ft{{s}) is obtained, 
and the weight w s is updated to 

w s <- w s exp(i]r/p s ). 

Auer et al. [?] showed that with appropriately chosen learning 
rate r\ and exploration probability 7 it holds that the cumu- 
lative regret R T of EXP3 is 0(s/Tn In n), i.e., the average 
regret Rt/T converges to zero. 

3.2 Centralized Selection of Multiple Sensors 

In principle, we could interpret the sensor selection problem 
as a (?) -armed bandit problem, and apply existing no-regret 
algorithms such as EXP3. Unfortunately, this approach does 
not scale, since the number of arms grows exponentially with 
k. However, in contrast to the traditional multiarmed bandit 
problem, where the arms are assumed to have independent 
payoffs, in the sensor selection case, the utility function is sub- 
modular and thus the payoffs are correlated across different 
sets. Recently, Streeter and Golovin showed how this submod- 
ularity can be exploited, and developed a no-(l — 1/e) -regret 
algorithm for online maximization of submodular functions 
[?]. The key idea behind their algorithm, OG un i t , is to turn the 
offline greedy algorithm into an online algorithm by replac- 
ing the greedy selection of the element that maximizes the 
benefit Sk — argmax s f({s\, Sk-i} U {s}) by a bandit 
algorithm. As shown in the pseudocode below, OG UN1T main- 
tains k bandit algorithms, one for each sensor to be selected. 
At each round t, it selects k sensors according to the choices 
of the k bandit algorithms £, 3 . Once the elements have been 
selected, the i th bandit algorithm £3 receives as feedback 
the incremental benefit /t(si, . . . , Sj) — /t(si, • ■ • , Sj-i), i.e., 
how much additional utility is obtained by adding sensor 
Si to the set of already selected sensors. Below we define 
[m] :={l,2,...,m}. 

Algorithm OG UNIT from [?]: 

Initialize k multiarmed bandit algorithms £\, £ 2 , • • • , £k> 
each with action set V. 
For each round t € [T] 
For each stage i e [k] in parallel 

£i selects an action v\ 
For each i € [k] in parallel 

feedback f t ({v] : j < »}) - f t {{v) : j < i}) to U 
Output St = {a[, al, ■ ■ ■ , a\}. 

3 Bandits with duplicate choices are handled in Sec. 4.6.1 of [?] 



In [?] it is shown that OG UNIT has a (l — i)-regret bound 
of O(kR) in this feedback model assuming each £j has ex- 
pected regret at most R. Thus, when using EXP3 as a sub- 
routine, OG UNIT has no-(l — l/e)-regret. 

Unfortunately, EXP3 (and in fact all MAB algorithms with 
no-regret guarantees for non-stochastic reward functions) re- 
quire sampling from some distribution with weights associ- 
ated with the sensors. If n is small, we could simply store 
these weights on the server, and run the bandit algorithms £j 
there. However, this solution does not scale to large numbers 
of sensors. Thus the key problem for online sensor selection is 
to develop a multiarmed bandit algorithm which implements 
distributed sampling across the network, with minimal over- 
head of communication. In addition, the algorithm needs to 
be able to maintain the distributions (the weights) associated 
with each £j in a distributed fashion. 

4. DISTRIBUTED ALGORITHM FOR 
ONLINE SENSOR SELECTION 

We will now develop DOG, an efficient algorithm for dis- 
tributed online sensor selection. For now we make the follow- 
ing assumptions: 

1 . Each sensor v e V is able to compute its contribution 
to the utility f t (S U {v}) — ft(S), where S are a subset 
of sensors that have already been selected. 

2. Each sensor can broadcast to all other sensors. 

3. The sensors have calibrated clocks and unique, linearly 
ordered identifiers. 

These assumptions are reasonable in many applications: 
(1) In target detection, for example, the objective function 
ft(S) counts the number of targets detected by the sensors 

5. Once previously selected sensors have broadcasted which 
targets they detected, the new sensor s can determine how 
many additional targets have been detected. Similarly, in sta- 
tistical estimation, one sensor (or a small number of sensors) 
randomly activates each round and broadcasts its value. After 
sensors S have been selected and announced their measure- 
ments, the new sensor s can then compute the improvement 
in prediction accuracy over the previously collected data. (2) 
The assumption that broadcasts are possible may be realistic 
for dense deployments and fairly long range transmissions. 
In Sec. 5 we will show how assumptions (1) and (2) can be 
relaxed. 

As we have seen in Sec. 3, the key insight in developing 
a centralized algorithm for online selection is to replace the 
greedy selection of the sensor which maximally improves the 
total utility over the set of previously selected sensors by a 
bandit algorithm. Thus, a natural approach for developing a 
distributed algorithm for sensor selection is to first consider 
the single sensor case. 



4.1 Distributed Selection of a Single Sensor 

The key challenge in developing a distributed version of 
EXP3 is to find a way to sample exactly one element from a 
probability distribution p over sensors in a distributed manner. 
This problem is distinct from randomized leader election [?], 
where the objective is to select exactly one element but the ele- 
ment need not be drawn from a specified distribution. We note 
that under the multi-hop communication model, sampling one 
element from the uniform distribution given a rooted span- 
ning tree can be done via a simple random walk [?], but that 
under the broadcast and star network models this approach de- 
generates to centralized sampling. Our algorithm, in contrast, 
samples from an arbitrary distribution by allowing sensors to 
individually decide to activate. Our bottom-up approach also 
has two other advantages: (1) it is amenable to modification 
of the activation probabilities based on local observations, as 
we discuss in Sec. 6, and (2) since it does not rely on any 
global state of the network such as a spanning tree, it can 
gracefully cope with significant edge or node failures. 

A naive distributed sampling scheme. A naive distributed 
algorithm would be to let each sensor keep track of all ac- 
tivation probabilities p. Then, one sensor (e.g., with the 
lowest identifier) would broadcast a single random number 
u uniformly distributed in [0,1], and the sensor v for which 
Y^iZi Pi — u < Sl=i Pi would activate. However, for large 
sensor network deployments, this algorithm would require 
each sensor to store a large amount of global information (all 
activation probabilities p). Instead, each sensor v could store 
only their own probability mass p v ; the sensors would then, 
in order of their identifiers, broadcast their probabilities p v , 
and stop once the sum of the probabilities exceeds u. This 
approach only requires a constant amount of local informa- 
tion, but requires an impractical 6(n) messages to be sent, 
and sent sequentially over 9(n) time steps. 

Distributed multinomial sampling. In this section we 
present a protocol that requires only 0(1) messages in expec- 
tation, and only a constant amount of local information. 

For a sampling procedure with input distribution p, we let 
p denote the resulting distribution, where in all cases at most 
one sensor is selected, and nothing is selected with probability 
1 — J2v Pv A simple approach towards distributed sampling 
would be to activate each sensor v independently from 
each other with probability p v . While in expectation, exactly 
one sensor is activated, with probability Y\ v (l — p v ) > 
no sensor is activated; also since sensors are activated inde- 
pendently, there is a nonzero probability that more than one 
sensor is activated. Using a synchronized clock, the sensors 
could determine if no sensor is activated. In this case, they 
could simply repeat the selection procedure until at least one 
sensor is activated. One naive approach would be to repeat 
the selection procedure until exactly one sensor is activated. 



However with two sensors and p\ = e,p2 = 1 — e this algo- 
rithm yields p x = e 2 /(l - 2e + 2e 2 ) = C(e 2 ), so the first 
sensor is severely underrepresented. Another simple protocol 
would be to select exactly one sensor uniformly at random 
from the set of activated sensors, which can be implemented 
using few messages. 



The Simple Protocol: 

For each sensor v in parallel 
Sample X v <~ Bernoulli 
If (X v = 1), X v activates. 
All active sensors S coordinate to select a single sen- 
sor uniformly at random from S, e.g., by electing the 
minimum ID sensor in S to do the sampling. 



It is not hard to show that with this protocol, for all sensors v, 
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by appealing to Jensen's inequality. Since p v < p v , we find 
that this simple protocol maintains a ratio r v :~ p v /Pv € 
[|, 1]. Unfortunately, this analysis is tight, as can be seen 
from the example with two sensors and p\ = e,p 2 = 1 — e. 

To improve upon the simple protocol, first consider running 
it on an example with pi = p 2 = ■ ■ ■ = p n = 1 /n. Since 
the protocol behaves exactly the same under permutations of 
sensor labels, by symmetry we havepi = p 2 = ■ ■ ■ = p n , and 
thus Ti ~ Tj for all i, j. Now consider an input distribution 
p where there exists integers N and fci , k 2 , ■ ■ ■ , k n such that 
Pv = k v /N for all v. Replace each v with k v fictitious 
sensors, each with probability mass 1 /N, and each with a 
label indicating v. Run the simple protocol with the fictitious 
sensors, selecting a fictitious sensor v', and then actually 
select the sensor indicated by the label of v'. By symmetry 
this process selects each fictitious sensor with probability 
(1 — f3) /N, where f3 is the probability that nothing at all is 
selected, and thus the process selects sensor v with probability 
k v (1 — /3) /N = (1— (3)p v (since at most one fictitious sensor 
is ever selected). 

We may thus consider the following improved protocol 
which incorporates the above idea, simulating this modifica- 
tion to the protocol exactly when p v — k v /N for all v. 



The Improved Protocol(A): 

For each sensor v in parallel 

Sample X v ~ Binomial(|~A • p v ] , l/N). 
If (X v > 1), then activate sensor v. 
From the active sensors S, select sensor v with proba- 
bility A^/ X^,^ Ay 



This protocol ensures the ratios r v :— p v /p v are the same 
for all sensors, provided each p v is a multiple of 1/A. As- 
suming the probabilities are rational, there will be a suffi- 
ciently large N to satisfy this condition. To reduce (3 :— 



Pr [S = 0] in the simple protocol, we may sample each X v 
from Bernoulli (a ■ p v ) for any a e [l,n]. The symmetry 
argument remains unchanged. This in turn suggests sampling 
X v from Binomial([A • p v ~\ , a/N) in the improved proto- 
col. Taking the limit as N — > oo, the binomial distribution 
becomes Poisson, and we obtain the desired protocol. 



The Poisson Multinomial Sampling (PMS) Protocol(a): 

Same as the improved protocol, except each 
sensor v samples X v <~ Poisson(ap tJ ) 



Straight-forward calculation shows that 

Pr [5 = 0]= J^[cxp{-a ■ p v } = exp{- ^ = eT a 

V V 

Let C be the number of messages. Then 

E [C] = Pr i X v > 1] = ]T(l-e- Qp ") <J2 a Pv = a 

V V V 

Here we have used linearity of expectation, and 1 + x < e x 
for all x e K. In summary, we have the following result about 
our protocol: 

Proposition 1. Fix any fixed p and a > 0. The PMS 
Protocol always selects at most one sensor, ensures 

\/v : Pr [v selected] = (1 — e~ a )p v 

and requires no more than a messages in expectation. 

In order to ensure that exactly one sensor is selected, when- 
ever S = we can simply rerun the protocol with fresh 
random seeds as many times as needed until S is non-empty. 
Using a = 1, this modification will require only 0(1) mes- 
sages in expectation and at most O(logn) messages with 
high probability in the broadcast model. We can combine this 
protocol with EXP3 to get the following result. 

THEOREM 2. In the broadcast model, running EXP3 us- 
ing the PMS Protocol with a = 1, and rerunning the pro- 
tocol whenever nothing is selected, yields exactly the same 
regret bound as standard EXP3, and in each round at most 
e/(e — l) + 2« 3.582 messages are broadcast in expectation. 



The regret bound for EXP3 is 0{\/OPTn logn), where 
OPT is the total reward of the best action. Our variant sim- 
ulates EXP3, and thus has identical regret. Proofs of our 
theoretical results can be found in the Appendix. 

Remark. Running our variant of EXP 3 requires that each 
sensor know the number of sensors, n, in order to compute its 
activation probability. If each sensor v has only a reasonable 
estimate of n v of n, however, our algorithm still performs 
well. For example, it is possible to prove that if all of the 
sensors have the same estimate n v — cn for some constant 
c > 0, then the upper bound on expected regret, R(c), grows 



as R(c) w i?(l) • max {c, 1/c}. The expected number of 
activations in this case increases by at most ( 3 — l) 7- In 
general underestimating n leads to more activations, and 
underestimating or overestimating n can lead to more regret. 
This graceful degradation of performance with respect to the 
error in estimating n holds for all of our algorithms. 

4.2 The Distributed Online Greedy Algorithm 

We now use our single sensor selection algorithm to de- 
velop our main algorithm, the Distributed Online Greedy 
algorithm (DOG). It is based on the distributed implementa- 
tion of EXP3 using the PMS Protocol. Suppose we would 
like to select k sensors at each round t. Each sensor v main- 
tains k weights w Vi i, . . . ,w Vi k and normalizing constants 
Z v ^, . . . , Z v ^. The algorithm proceeds in k stages, synchro- 
nized using the common clock. In stage i, a single sensor is 
selected using the PMS Protocol applied to the distribution 
(1— 7)io„ ) i/Z t)) j+7/n. Suppose sensors S = {vi, . . . , 
have been selected in stages 1 through i — 1. The sensor v 
selected at stage i then computes its local rewards tt v , , using 
the utility function f t (S U — ft(S). It then computes 
its new weight 

w 'v,i = Wv,iexp(wx v ,i/Pv,i), 

and broadcasts the difference between its new and old weights 
A„ ; , = w' v i — w^.i- All sensors then update their i th normaliz- 
ers using Z v j 4— Z v j + A v _i. Fig. 1 presents the pseudo-code 
of the DOG algorithm. Thus given Theorem 12 of [?] we 
have the following result about the DOG algorithm: 

THEOREM 3. The DOG algorithm selects, at each round 
tasetSt C Vofk sensors such that 
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In expectation, only 0{k) messages are exchanged each 
round. 

5. THE STAR NETWORK MODEL 

In some applications, the assumption that sensors can broad- 
cast messages to all sensors may be unrealistic. Furthermore, 
in some applications sensors may not be able to compute the 
marginal benefits f t (S U {s}) — ft(S) (since this calculation 
may be computationally complex). In this section, we an- 
alyze LAZYDOG, a variant of our DOG algorithm, which 
replace the above assumptions by the assumption that there is 
a dedicated base station 4 available which computes utilities 
and which can send non-broadcast messages to individual 
sensors. 



Though the existence of such a base station means the protocol is 
not completely distributed, it is realistic in sensor network applica- 
tions where the sensor data needs to be accumulated somewhere for 
analysis. 



We make the following assumptions: 

1 . Every sensor stores its probability mass p v with it, and 
can only send messages to and receive messages from 
the base station. 

2. The base station is able, after receiving messages from 
a set S of sensors, to compute the utility f t (S) and send 
this utility back to the active sensors. 

These conditions arise, for example, when cell phones in 
participatory sensor networks can contact the base station, but 
due to privacy constraints cannot directly call other phones. 
We do not assume that the base station has access to all 
weights of the sensors - we will only require the base sta- 
tion to have 0(k + logn) memory. In the fully distributed 
algorithm DOG that relies on broadcasts, it is easy for the 
sensors to maintain their normalizers Z v ^, since they receive 
information about rewards from all selected sensors. The 
key challenge when removing the broadcast assumption is to 
maintain the normalizers in an appropriate manner. 

5.1 Lazy renormalization & Distributed EXP3 

EXP3 (and all MAB with no-regret guarantees against 
arbitrary reward functions) must maintain a distribution over 
actions, and update this distribution in response to feedback 
about the environment. In EXP3, each sensor v requires 
only w v (t) and a normalizer Z(t) := J2 V ' Wv ' (*) t0 com P ute 
p v (t) 5 . The former changes only when v is selected. In the 
broadcast model the latter can simply be broadcast at the end 
of each round. In the star network model (or, more generally 
in multi-hop models), standard flooding echo aggregation 
techniques could be used to compute and distribute the new 
normalizer, though with high communication cost. We show 
that a lazy renormalization scheme can significantly reduce 
the amount of communication needed by a distributed bandit 
algorithm without altering its regret bounds whatsoever. Thus 
our lazy scheme is complementary to standard aggregation 
techniques. 

Our lazy renormalization scheme for EXP 3 works as fol- 
lows. Each sensor v maintains its weight w v (t) and an esti- 
mate Z v (t) for Z(t) := J2 V ' w v(t), Initially, tu„(0) = 1 and 
Z v (0) = n for all v. The central server stores Z(t). Let 

p(x,y) := (I-7)- + 7 



y 



n 



Each sensor then proceeds to activate as in the sampling 
procedure of Sec. 4.1 as if its probability mass in round 
t were q v — p(w v (t), Z v (t)) instead of its true value of 
p(w v (t), Z(t)). A single sensor is selected by the server 
with respect to the true value Z(t), resulting in a selection 
from the desired distribution. Moreover, v's estimate Z v (t) 

5 We let x(t) denote the value of variable x at the start of round t, to 
ease analysis. We do not actually need to store the historical values 
of the variables over multiple time steps. 



Algorithm: Distributed Online Greedy (DOG) (described in the broadcast model) 

Input: fceN.a set V, and a, 7, i] e M>o- Reasonable defaults are any a € [1, In | V|], and 7 = r\ = 

mm (l, (|V| In \V\/g) 1/2 y where g is a guess for the maximum cumulative reward of any single sensor [?]. 

Initialize w v ^ <- 1 and Z Vt i <- \V\ for all v £ V, i e [fc]. Let y) := (1 — 7)^ + j^y. 

for each round t = 1, 2, 3, ... do 

Initialize SV,^ <— for each v in parallel, 
for each stage i e [fc] do 

for each sensor jjeVin parallel do 
repeat 

Sample X v ~ Poisson(a • p(w v , i} Z v ^ j). 
if (X„ > 1) then 

Broadcast (sampled X„, id(w)); Receive messages from sensors S. (Include v € 5 for convenience), 
if id(v) = min,„/ e s id(v') then 

Select exactly one element w it from 5 such that each v' is selected with probability X v > / J2v.es 

Broadcast (select id(i^)). 
Receive message (select id(v it )). 
if id(u) = id(ujt) then 

Observe f t (S Vtt + v); it <- f t {S Vyt + v) - f t {S Vyt )\ A v 4- tu„,i(exp {n ■ ir/p{w Vyi ,Z Vyi )} - 1); 
; w v w v + A v ; Broadcast (weight update A„, id(u)). 
if receive message (weight update A, id(v it )) then S v . t S Vi t U {v it }; Z v j Z v j + A; 
until v receives a message of type (select id) ; 

Output: At the end of each round t each sensor has an identical local copy S Vi t of the selected set St. 

Figure 1: The Distributed Online Greedy Algorithm 



is only updated on rounds when it communicates with the 
server under these circumstances. This allows the estimated 
probabilities of all of the sensors to sum to more than one, but 
has the benefit of significantly reducing the communication 
cost in the star network model under certain assumptions. 
We call the result Distributed EXP '3, give its pseudocode for 
round t in Fig. 2. 

Since the sensors underestimate their normalizers, they 
may activate more frequently than in the broadcast model. 
Fortunately, the amount of "overactivation" remains bounded. 
We prove Theorem 4 and Corollary 5 in Appendix B. 

THEOREM 4. The number of sensor activations in any 
round of the Distributed EXP3 algorithm is at most a + (e — 1) 
in expectation and 0(a + log n) with high probability, and 
the number of messages is at most twice the number of acti- 
vations. 

Unfortunately, there is still an e~ a probability of nothing 
being selected. To address this, we can set a = c In n for 
some c > 1, and if nothing is selected, transmit a message to 
each of the n sensors to rerun the protocol. 

COROLLARY 5. There is a distributed implementation of 
EXP3 that always selects a sensor in each round, has the 
same regret bounds as standard EXP3, ensures that the num- 
ber of sensor activations in any round is at most In n + 0(1) 
in expectation or O (log n) with high probability, and in which 



the number of messages is at most twice the number of acti- 
vations. 

5.2 LazyDOG 

Once we have the distributed EXP3 variant described 
above, we can use it for the bandit subroutines in the OG UNIT 
algorithm (cf. Sec. 3.2). We call the result the LAZYDOG 
algorithm, due to its use of lazy renormalizafion. The lazy 
distributed EXP3 still samples sensors from the same distri- 
bution as the regular distributed EXP3, so LAZYDOG has 
precisely the same performance guarantees with respect to 
J2 t ft(St) as DOG. It works in the star network communica- 
tion model, and requires few messages or sensor activations. 
Corollary 5 immediately implies the following result. 

COROLLARY 6. The number of sensors that activate each 
round in LAZYDOG is at most k In n + O(k) in expectation 
andO(k log n) with high probability, the number of messages 
is at most twice the number of activations, and the (1 — 1 /e)- 
regret of LAZYDOG is the same as DOG. 

If we are not concerned about the exact number of sensors 
selected in each round, but only want to ensure roughly k 
sensors are picked in expectation, then we can reduce the 
number of sensor activations and messages to 0(k), by run- 
ning LAZYDOG with k' := \k/(l - e~")] stages for some 
constant a, and allowing each stage to run the Poisson Multi- 
nomial Sampling Protocol with lazy renormalization without 



Algorithm: Distributed EXP3 (executing on round t) 
Input: Parameters a, 77, 7 G R >a , sensor set V. 

Let p(x,y) := (1- 7 )| + ^. 
Sensors: 

foreach sensor v in parallel do 

Sample r v uniformly at random from [0, 1]. 
if (r v > 1 — a ■ p(w v (t), Z v (t)) then 
Send (r v ,w v (t)) to the server. 
Receive message (Z, w) from server. 
Z v (t+ 1) <- Z; + 1) <- w. 
else + 1) + 1) <- 

Server: 

Receive messages from a set S of sensors. 

if 5 = then Select nothing and wait for next round. 

else foreach sensor v e S do 

Y„ min {a; : Pr [X < x] > r v }, where 

X ~ Poisson(a • p\w v (i), Z(t))). 

Select w with probability Y v j J2 v 'es ^v'- 

Observe the payoff tt for the selected sensor v*; 

w v * (t + 1) 4- (i) • exp {i]ir/p(w v * (t),Z(t))}; 

z(t + Z(t) + uv (i + 1) - «v (*); 
for each v e S\v* do w v (t + 1) 4- w v (t); 
foreach veS do Send (Z(i+l),iu„(i+l)) to v. 

Figure 2: Distributed EXP3: the PMS Protocol(a) 
with lazy renormalization, applied to EXP3 



rerunning it if nothing is selected. This is of course optimal 
up to constants, as we must send at least one message per 
selected sensor. 

THEOREM 7. The variant of LAZYDOG that runs the 
Poisson Multinomial Sampling Protocol (a) with lazy renor- 
malization for k' := \k/(l — e~ a )] stages, but does not 
rerun it if nothing is selected in a given stage, has the follow- 
ing guarantees: (1) the number of sensors that activate each 
round in LAZYDOG is at most k'(a + e — 1) in expectation 
and 0(ak\ogn) with high probability, (2) the number of 
messages is at most twice the number of activations, (3) the 
expected number of sensors selected in each round is at most 
k' and (4) its (1 — l/e)-regret is at most k'/k times that of 
DOG. 

We defer the proof to Appendix C. 

6. OBSERVATION-DEPENDENT SAMPLING 

Theorem 3 states that DOG is guaranteed to do nearly 
as well as the offline greedy algorithm run on an instance 
with objective function /s := J2t ft- Thus the reward of 
DOG is asymptotically near-optimal on average. In many 
applications, however, we would like to perform well on 
rounds with "atypical" objective functions. For example, in 
an outbreak detection application as we discuss in Sec. 7, we 



would like to get very good data on rounds with significant 
events, even if the nearest sensors typically report "boring" 
readings that contribute very little to the objective function. 
For now, suppose that we are only running a single MAB 
instance to select a single sensor in each round. If we have 
access to a black-box for evaluating f t on round t, then we 
can perform well on atypical rounds at the cost of some 
additional communication by having each sensor v take a 
local reading of its environment and estimate its payoff tt = 
ft({v}) if selected. This value, which serves as a measure of 
how interesting its input is, can then be used to decide whether 
to boost v's probability for reporting its sensor reading to the 
server. In the simplest case, we can imagine that each v has a 
threshold t v such that v activates with probability 1 if 7f > t v , 
and with its normal probability otherwise. In the case where 
we select k > 1 sensors in each round, each sensor can have 
a threshold for each of the k stages, where in each stage it 
computes tt = f t (S U {v}) — f t {S) where S is the set of 
currently selected sensors. Since the activation probability 
only goes up, we can retain the performance guarantees of 
DOG if we are careful to adjust the feedback properly. 

Ideally, we wish that the sensors learn what their thresholds 
t v should be. We treat the selection of t v in each round 
as an online decision problem that each v must play. We 
construct a particular game that the sensors play, where the 
strategies are the thresholds (suitably discretized), there is an 
activation cost c v that v pays if n v > t v , and the payoffs are 
defined as follows: Let ir v = f t (S U {v}) — ft(S) be the 
marginal benefit of selecting v given that sensor set S has 
already been selected. Let A be the set of sensors that activate 
in the current iteration of the game, and let max (it(a\v)) 
max (tt v > : v' € A \ {v}). The particular reward function ip v 
we choose for each sensor v for each iteration of the game is 

, , x _ f c v - max (jt v - max (tt(a\v)) ,0) if 7f < t 
|_ max (tt v — max (tt(a\v)) j 0) — c v if tt > t 

based on empirical performance. Thus, if a sensor activates 
(n > r), its payoff is the improvement over the best payoff 
tt v i among all sensors v' £ A minus its activation cost. In 
case multiple sensors activate, the highest reward is retained. 

In the broadcast model where each sensor can compute its 
marginal benefit, we can use any standard no-regret algorithm 
for combining expert advice, such as Randomized Weighted 
Majority (WMR) [?], to play this game and obtain no regret 
guarantees 6 for selecting t v . In our context a sensor using 
WMR simply maintains weights wfa) = exp (rj ■ iptotat ( T i ) ) 
for each possible threshold Ti, where r\ > is a learning 
parameter, and ^totai(Ti) is me total cumulative reward for 
playing r, in every round so far. On each step each threshold 

6 We leave it as an open problem to determine if the outcome is close 
to optimal when all sensors play low regret strategies (i.e., is the 
price of total anarchy [?] small in any variant of this game with a 
reasonable way of splitting the value from the information?) 



is picked with probability proportional to its weight. In the 
more restricted star network model, we can use a modification 
of WMR that feeds back unbiased estimates for V't(' 7 »)> me 
payoff to the sensor for using a threshold of r% in round t, 
and thus obtains reasonably good estimates of V'totai(7"i) after 
many rounds. We give pseudocode in Fig. 3. In it, we assume 
that an activated sensor can compute the reward of playing 
any threshold. 



We incorporate these ideas into the DOG algorithm, to 
obtain what we call the Observation-Dependent Distributed 
Online Greedy algorithm (OD-DOG). In the extreme case 
that c v = for all v the sensors will soon set their thresholds 
so low that each sensor activates in each round. In this case 
OD-DOG will exactly simulate the offline greedy algorithm 
run on each round. In other words, if we let G(f) be the 
result of running the offline greedy algorithm on the problem 

argmax{/(S) : S C V, \S\ < k} 

then OD-DOG will obtain a value of J2 t ft{G{f t ))\ in con- 
trast, DOG gets roughly ^2 t ft(G(J2t ft))> which may be 
significantly smaller. Note that Feige's result [?] implies that 
the former value is the best we can hope for from efficient 
algorithms (assuming P ^ NP). Of course, querying each 
sensor in each round is impractical when querying sensors is 
expensive. In the other extreme case where c v = oo for all v, 
OD-DOG will simulate DOG after a brief learning phase. In 
general, by adjusting the activation costs c v we can smoothly 
trade off the cost of sensor communication with the value of 
the resulting data. 

7. EXPERIMENTS 

In this section, we evaluate our DOG algorithm on several 
real-world sensing problems. 

7.1 Data sets 



Temperature data. In our first data set, we analyze temper- 
ature measurements from the network of 46 sensors deployed 
at Intel Research Berkeley. Our training data consisted of 
samples collected at 30 second intervals on 3 consecutive 
days (starting Feb. 28th 2004), the testing data consisted of 
the corresponding samples on the two following days. The 
objective functions used for this application are based on the 
expected reduction in mean squared prediction error /emse, 
as introduced in Sec. 2. 

Precipitation data. Our second data set consists of pre- 
cipitation data collected during the years 1949 - 1994 in the 
states of Washington and Oregon [?]. Overall 167 regions 
of equal area, approximately 50 km apart, reported the daily 
precipitation. To ensure the data could be reasonably mod- 
eled using a Gaussian process we applied preprocessing as 
described in [?]. As objective functions we again use the 
expected reduction in mean squared prediction error /emse- 

Water network monitoring. Our third data set is based on 
the application of monitoring for outbreak detection. Con- 
sider a city water distribution network for delivering drinking 
water to households. Accidental or malicious intrusions can 
cause contaminants to spread over the network, and we want 
to install sensors to detect these contaminations as quickly 
as possible. In August 2006, the Battle of Water Sensor Net- 
works (BWSN) [?] was organized as an international chal- 
lenge to find the best sensor placements for a real metropolitan 
water distribution network, consisting of 12,527 nodes. In 
this challenge, a set of intrusion scenarios is specified, and 
for each scenario a realistic simulator provided by the EPA 
is used to simulate the spread of the contaminant for a 48 
hour period. An intrusion is considered detected when one 
selected node shows positive contaminant concentration. The 
goal of BWSN was to minimize impact measures, such as 
the expected population affected, which is calculated using 
a realistic disease model. For a security-critical sensing task 
such as protecting drinking water from contamination, it is 
important to develop sensor selection schemes that maximize 
detection performance even in adversarial environments (i.e., 
where an adversary picks the contamination strategy know- 
ing our network deployment and selection algorithm). The 
algorithms developed in this paper apply to such adversarial 
settings. We reproduce the experimental setup detailed in 
[?]. For each contamination event i, we define a separate sub- 
modular objective function fi(S) that measures the expected 
population protected when detecting the contamination from 
sensors S. In [?], Krause et al. showed that the functions 
fi(A) are monotone submodular functions. 

7.2 Convergence experiments 

In our first set of experiments, we analyzed the conver- 
gence of our DOG algorithm. For both the temperature [T] 



Algorithm: Modified WMR (star network setting) 

Input: parameter 77 > 0, threshold set {ri : i <G [m]} 
Initialize w(t,) <— 1 for all i £ [m]. 
for each round t = 1, 2, ... do 

Select Ti with probability w{ji) j J2j=i w ( T j)- 
if sensor activates then 

Let ip(ri) be the reward for playing n in this 
round of the game. Let qfa) be the total 
probability of activation conditioned on r, being 
selected (including the activation probability that 
does not depend on local observations.) 
for each threshold n do 

w(n) «- w(Ti)exp(r]ip(Ti)/q(Ti)). 

Figure 3: Selecting activation thresholds for a sensor 



and precipitation [R] data sets, we first run the offline greedy 
algorithm using the /emse objective function to pick k = 5 
sensors. We compare its performance to the DOG algorithm, 
where we feed back the same objective function at every 
round. We use an exploration probability 7 = 0.01 and a 
learning rate inversely proportional to the maximum achiev- 
able reward /emse(V)- Fig. 4(a) presents the results for 
the temperature data set. Note that even after only a small 
number of rounds (« 100), the algorithm obtains 95% of the 
performance of the offline algorithm. After about 13,000 iter- 
ations, the algorithm obtains 99% of the offline performance, 
which is the best that can be expected with a .01 exploration 
probability. Fig. 4(b) show the same experiment on the pre- 
cipitation data set. In this more complex problem, after 100 
iterations, 76% of the offline performance is obtained, which 
increases to 87% after 500,000 iterations. 

7.3 Observation dependent activation 

We also experimentally evaluate our OD-DOG algorithm 
with observation specific sensor activations. We choose dif- 
ferent values for the activation cost c v , which we vary as 
multiples of the total achievable reward. The activation cost 
c v lets us smoothly trade off the average number of sensors 
activating each round and the average obtained reward. The 
resulting activation strategies are used to select a subset of 
size k = 10 from a collection of 12,527 sensors. Fig. 4(c) 
presents rates of convergence using the OD-DOG algorithm 
under a fixed objective function which considers all contami- 
nation events. In Fig. 4(d), convergence rates are presented 
under a varying objective function, which selects a differ- 
ent contamination event on each round. For low activation 
costs, the performance quickly converges to or exceeds the 
performance of the offline solution. Even under the lowest ac- 
tivation costs in our experiments, the average number of extra 
activations per stage in the OD-DOG algorithm is at most 5. 
These results indicate that observation specific activation can 
lead to drastically improved performance at small additional 
activation cost. 

8. RELATED WORK 

Sensor Selection. The problem of deciding when to selec- 
tively turn on sensors in sensor networks in order to conserve 
power was first discussed by [?] and [?]. Many approaches 
for optimizing sensor placements and selection assume that 
sensors have a fixed region [?, ?, ?]. These regions are usually 
convex or even circular. Further, it is assumed that everything 
within a region can be perfectly observed, and everything 
outside cannot be measured by the sensors. For complex 
applications such as environmental monitoring, these assump- 
tions are unrealistic, and the direct optimization of prediction 
accuracy is desired. The problem of selecting observations 
for monitoring spatial phenomena has been investigated ex- 



tensively in geostatistics [?], and more generally (Bayesian) 
experimental design [?]. Several approaches have been pro- 
posed to activate sensors in order to minimize uncertainty [?] 
or prediction error [?]. However, these approaches do not 
have performance guarantees. Submodularity has been used 
to analyze algorithms for placing [?] or selecting [?] a fixed 
set of sensors. These approaches however assume that the 
model is known in advance. 

Submodular optimization. The problem of centralized 
maximization of a submodular function has been studied 
by [?]> who proved that the greedy algorithm gives a factor 
(1 — 1/e) approximation. Several algorithms have since been 
developed for maximizing submodular functions under more 
complex constraints (see [?] for an overview). Streeter and 
Golovin developed an algorithm for online optimization of 
submodular functions, which we build on in this paper [?]. 

9. CONCLUSIONS 

In this paper, we considered the problem of repeatedly se- 
lecting subsets St from a large set of deployed sensors, in 
order to maximize a sequence of submodular utility functions 
fi, . . . , fx- We developed an efficient Distributed Online 
Greedy algorithm DOG, and proved it suffers no (1 — 1/e)- 
regret, essentially the best possible performance obtainable 
unless P = NP. Our algorithm is fully distributed, requiring 
only a small number of messages to be exchanged at each 
round with high probability. We analyze our algorithm both 
in the broadcast model, and in the star network model, where 
a separate base station is responsible for computing utilities 
of selected sets of sensors. Our LAZYDOG algorithm for 
the latter model uses lazy renormalization in order to reduce 
the number of messages required from O(n) to O(fclogn), 
and the server memory required from 9(n) to 0(k + logn), 
where k is the desired number of sensors to be selected. In 
addition, we developed OD-DOG, an extension of DOG that 
allows observation-dependent sensor selection. We empiri- 
cally demonstrate the effectiveness of our algorithms on three 
real-world sensing tasks, demonstrating how our DOG algo- 
rithm's performance converges towards the performance of a 
clairvoyant offline greedy algorithm. In addition, our results 
with the OD-DOG algorithm indicate that a small number of 
extra sensor activations can lead to drastically improved con- 
vergence. We believe that our results provide an interesting 
step towards a principled study of distributed active learning 
and information gathering. 
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Figure 4: Experimental results on [T] Temperature data, [R] precipitation data and [W] water distribution network data. 



APPENDIX 

A. RESULTS IN THE BROADCAST MODEL 

Proof of Theorem 2. To prove the regret bounds, note 
that in every round the distribution over sensor selections in 
the variant of EXP3 we describe (that uses the distributed 
multinomial sampling scheme and repeatedly reruns the pro- 
tocol in order to always select some sensor in each round) 
is precisely the same as the original EXP3 . Thus the regret 
bounds for EXP3 [?] carry over unchanged. We next bound 
the number of broadcasts. Fix a round, and let S set of sensors 
that activate in that round. The total number of broadcasts 
is then IS*! + 2; using their calibrated clocks, each sensor 
(re)samples X v ~ Poisson(ap tJ ) and activates if X v > 1. 
If no sensors activate before a specified timeout period, the 
default behavior is to rerun the sampling step. Eventually 
\S\ > 1 sensors activate in the same period. A distinguished 
sensor in S then determines the selected sensor v, broad- 
casts id(i>), and v broadcasts its observed reward. We prove 
E [151] < a/(l - e~ a ) in Proposition 8. When a = 1, this 
gives us the claimed bound on the number of broadcasts. □ 

PROPOSITION 8. Rerunning the Poisson Multinomial Sam- 
pling Protocol until an element is selected results in at most 
a/ (I — e~ a ) elements being activated in expectation. More- 
over, this value is tight. 

Proof. Let X. v ~ Bernoulli^ • p v ) be the indicator ran- 



dom variable for the activation of v, and let X :- 
The expected number of sensor activations is then 

E [X | X > 1] = E [X] /Pr [X>1]. 



In the limit as max„ p v tends to zero, X converges to a Pois- 



EIX1 



son random variable with mean a. In this case, Pr [x>i] ~ 
a/(l — e~ Q ) To see that this is an upper bound, consider 
an arbitrary distribution p on the sensors, and fix some v 
with x :— p v > 0. We claim that replacing v with two sen- 
sors vi and V2 with positive probability mass x\ and x-i with 
x = x\ + X2 can only serve to increase the expected num- 
ber of sensor activations, because E [X] is unchanged, and 
Pr [X > 1] decreases. The latter is true essentially because 
Pr [3i e {1, 2} : v l activates] = 1 - (1 - Xi)(l - x 2 ) = 
x — xix 2 < x. To complete the proof, notice that repeating 
this process with v = argmax^) and Xi = x/2 ensures 
X converges to a Poisson variable with mean a, while only 
increasing E[X \ X > 1]. □ 

B. RESULTS IN THE STAR NETWORK 
MODEL 

In this section we will prove that lazy renormalization sam- 
ples sensors from a proper scaled distribution (1 — e~ a )p v 
where p v is the input distribution. We then bound the com- 
munication overhead of using lazy renormalization for any 
MAB algorithm satisfying certain assumptions enumerated 



below, and then show how these bounds apply to EXP3. 



2. There exists an e > such that p v (t) > e for all t. 



PROPOSITION 9. The lazy renormalization scheme of Sec. 5.1, 
described in pseudocode in Fig. 2, samples v with probabil- 
ity (1 — e~ a )p v , where p v = p(w v (t), Z(t)) is the desired 
probability mass for v. 

PROOF. Lazy renormalization selects each sensor v with 
probability (1 — e~ a )p v , because of the way the random bits 
r v are shared in order to implement a coupled distribution for 
sensor activation and selection. Note that it would be suffi- 
cient to run the Poisson Multinomial Sampling Protocol on 
the correct (possibly oversampled) probabilities, ap v , since 
then Prop. 1 ensures that each v is selected with probability 
(1 — e~ a )p v . The difficulty is that v does not have access 
to the correct normalizer Z(t), but only its estimate (lower 
bound) for it, Z v (t). To overcome this difficulty, we define 
a joint probability distribution over two random variables 
(X V ,Y V ), where 



X v — X V (R) 



\ ot 



R> 1- 
otherwise 



a ■ p(w v {t),Z v (t)) 



Y v =Y V (R) :=mhW& : 



'< -X X a 



> R 



a=0 



01 



and A := a ■ p(w v (t), Z(t)), and R is sampled uniformly 
at random from [0, 1]. Now, note that Y v is distributed as 
Poisson(A). Also note that Y v > 1 implies X v > 1, because 
Y v > 1 implies R > e~ A and 

e~ x > 1 - A > 1 - a ■ p(w v (t), Z v (t)) 

since 1 + x < e x for all x e K, and p(w v (t), Z v {t)) > 
p(w v (t), Z(t)) due to fact that Z v (t) < Z{t). It follows that 
we can use the event X v > 1 as a conservative indicator 
that v should activate. In this case, it will send its sampled 
value for R, namely r v , and its weight w v (t) to the server. 
The server knows Z(t), and then can use r v and w v (t) to 
compute Y v (r v ), the sample from Poisson(A) that v would 
have drawn had it known Z(t). The resulting distribution on 
selected sensors is thus exactly the same as in the Poisson 
Multinomial Sampling Protocol without lazy renormalization. 
Invoking Prop. 1 thus completes the proof. □ 

We now describe the assumptions that are sufficient to 
ensure lazy renormalization has low communication costs. 
Fix an action v and a multiarmed bandit algorithm. Let 
p v (t) G [0, 1] be the random variable denoting the proba- 
bility the algorithm assigns to v on round t. The value of 
p v (t) depends on the random choices made by the algorithm 
and the payoffs observed by it on previous rounds. We assume 
the following about each p v (t). 

1 . p v (t) can be computed from local information v pos- 
sesses and global information the server has. 



3. p v (t) < p v (t + 1) implies v was selected in round t. 

4. There exists e > such that p v (t + 1) >p v (t)/(l + e) 
for all t. 

Many MAB algorithms satisfy these conditions. For example, 
all MAB algorithms with non-trivial no-regret guarantees 
against adversarial payoff functions must continually explore 
all their options, which effectively mandates p v (t) > e for 
some e > 0. In Lemma 1 we prove that EXP3 does so with 
e = 7/n and e = (e — 1)^, assuming payoffs in [0, 1]. In 
this case, Theorem 10 bounds the expected increase in sensor 
communications due to lazy renormalization by a factor of 

a 

THEOREM 10. Fix a multiarmed bandit instance with pos- 
sibly adversarial payoff functions, and a MAB algorithm sat- 
isfying the above assumptions on its distribution over actions 
{Pv(t)} v£ y Let q v (t) be the corresponding random esti- 
mates for p v (t) maintained under lazy renormalization with 
oversampling parameter a. Then for all v and t, 



E[q v (t)/p v (t)} < 1 + 



and 



E [«„(*)] < 1 



EM*)]. 



PROOF. Fix v, and let p(t) := p v (t), q{t) := q v (t). We 
begin by bounding Pr [q(t) > Xp{t)] for A > 1. Let to be 
the most recent round in which q(to) = p(to). We assume 
q(0) = p(0), so to exists. Then q(t) = p(to) > Ap(i) implies 
p(t )/p(t) > A. By assumption p(t')/p(t' + 1) < (l + e)for 
alii', so p(t )/p(t) < (l + e)*-*°. Thus A < (l + e)*"* and 
t - t > ln(A)/ ln(l + e). Define i(A) := ln(A) / ln(l + e). 

By definition of to, there were no activations under lazy 
renormalization in rounds to through t — 1 inclusive, which 
occurs with probability J|', = 1 4o (l — Q!(j(t')) = (1 — aq(t)Y~ to 
< (1 — ctg(t)) T*( A )1 , where a is the oversampling parame- 
ter in the protocol. We now bound E [q(t)/p(t) | q(t)]. Re- 
call that E [X] = J^1 Q Pr [X > x] dx for any non-negative 
random variable X. It will also be convenient to define 
w := ln(l/(l — aq(t)))/ln(l + e) and assume for now that 
oj > 1. Conditioning on q(t), we see that 

E [q(t)/p(t) | q(t)] = JZ Pv[q(t)>Xp(t)]dX 

= l + ir =1 Pr[ q (t)>Xp(t)]d\ 

< 1 + ^(1-^))*^ 

= l + JZi Xln(1 ~ aq{t))/Hl+i) d x 



= 1 + /; 
= 1 + 



1 



X-^dX 



Using In (izrj) > x f° r all x < 1 and ln(l + x) < x for 
all x > —1, we can show that ui > aq(t)/e so 1 + — ^ < 
aq(t)/(aq(t) — e). Thus, if ag(i) > e then w > 1 and we 
obtain E [q(t)/p(t) \ q(t)] < aq(t)/(aq(t) - e). 

If q(t) >> e, this gives a good bound. If q(t) is small, we 
rely on the assumption that p(t) > e for all t to get a trivial 
bound of q(t)/p(t) < q(t)/e. We thus conclude 

E [q(t)/p(t) | q(t)} < mm (aq(t)/(aq(t) - e), g(t)/e) . 

(B.l) 

Setting = (e/a + e) to maximize this quantity yields an 
unconditional bound of E [q(t)/p(t)] < 1 + e/ae. 
To bound E in terms of E [p(t)], note that for all q 

q/E \p(t) | q(t) = q]<E [q(t)/p(t) \ q(t) = q] 
< 1 + e/ae 

where the first line is by Jensen's inequality, and the second 
is by equation B.l. Thus q < (1 + e/ae) E [p{t) \ q(t) = q] 
for all q. Taking the expectation with respect to q then proves 
E [q v (t)} <{1 + ±)E \p v (t)] as claimed. □ 

LEMMA 1. EXP3 with r\ = 7/n satisfies the conditions 
of Theorem 10 with e — 7/n and e = (e — 1)^. 

PROOF. The former equality is an easy observation. To 
prove the latter equality, fix a round t and a selected action 
v. Let w v (t) be the weight of v in round t, and W{t) be 
the total weight of all actions in round t. Let tt be the pay- 
off to v in round t. Given the update rule w v (t + 1) = 

w v (t) exp ^ n V p V (t) ) ' on ty tne probabilities of the other ac- 
tions will be decreased. It is not hard to see that they will be 
decreased by a multiplicative factor of at most W(t)/W(t + 
1), no matter what the learning parameter 7 is. By the update 
rule, 



W(t + l) = W(t)+w v (t) exp 



7 7T 



1 



Let p := p v (t) and x := ^ir. Dividing the above equation by 
W(t), we get 

W(t + l) 



Wit) 



= 1 + p (exp (27V) - 1) 



(B.2) 



< l+p(x/p+(e-2)(x/p) 2 ) (B.3) 

< l + x + (e-2)x 2 /p (B.4) 



where in the second line we have used e x < 1 + x + (e — 2)x 2 
for x e [0,1]. Note it < 1 implies x < 7/n < p, so 
< l + (e-l)x < l + (e-l)^. It follows that setting 
e = (e — 1)^ is sufficient to ensure p v (t+ 1) > p v (t)/(l + e) 
for all t. □ 

We now prove Theorem 4 and Corollary 5. 

Proof of Theorem 4.. We prove in Lemma 1 that EXP3 
satisfies the conditions of Theorem 10 with e = 7/n and 



e = (e — 1)^. Thus by Theorem 10 



E 



< (l + (e-l)/a)E 
= (l + (e-l)/a) 



because 5Z„Pt;(t) = L Each sensor u activates with proba- 
bility aq v (t), so the expected number of activations is 



E 



a^2q v (t) 



<a(l + (e-l)/a). 



That proves the claimed bounds in expectation. To prove 
bounds with high probability, note that a sensor activates 
with probability aq v (t) in round t, where q v (t) is a random 
variable. Fix t. Let [£] denote the indicator variable for the 
event £, i.e., [£] = 1 if £ occurs, and [£] = otherwise. 
Then we can write [v activates in round t] = [aq v (t) > R], 
where R is sampled uniformly at random from [0, 1] and R 
is independent of q v (t). Then if fx is the probability density 
functions of R we can write 



Pr [R <aq v (t)]= [ Pr [aq v (t)>R\R = r] f f 

Jr=0 



(r)dr 



Pr [aq v {t) >r]f R (r)dr 



r=0 



= ( Pr [aq v (t) > r] dr 

Jr=0 

= E[aq v (t)} 



Thus the number of sensor activations is a sum of | V | binary 
random variables with cumulative mean {m := J2 V ^ l a( lv (t)]. 
We have already bounded this mean as fi < a + (e — 1). From 
here a simple application of a Chernoff-Hoeffding bound 
suffices to prove that with high probability this sum is at most 
0(a + logn). Let A be the number of sensor activations. 
Then, e.g., Theorem 5 of [?] immediately implies 



Pr [A > fi(l + 5)} < exp 



For 5 > 1, this yields Pr [A > fi{\ + 5)} < exp ( 




1 + 



8c In n 



ensures this probability is at most 



Setting 5 - . , 3m 
rT c , hence Pr [A > 2/i + |clnn] < n~ c . Noting that fi < 
a + (e — 1) completes the high probability bound on the 
number of activations. 

As for the number of messages, note that each message 
involves a sensor as sender or receiver, and by inspection the 
protocol only involves two messages per activated node. □ 

Proof of Corollary 5.. Use the distributed EXP3 pro- 
tocol with lazy renormalization with a — Inn. We have 
already established that the probability of nothing being se- 
lected is e~ Q or 1/n in this case. If nothing is selected, 



send out n messages, one to each sensor, to rerun the proto- 
col. The expected number of messages sent to initiate addi- 
tional runs of the protocol is nx l nX = (1 — V n )~ 2 = 
l + 0(l/n). Let X be the number of sensor activations. As in 
the proof of Proposition 8, if Y is the expected number of sen- 
sor activations without rerunning the protocol when nothing 
is selected, then E [X] = E [Y] /Pr [Y > 1]. By Theorem 4 
E [Y] < a (1 + (e - I) /a). Since Pr [Y > 1] = 1 - e~ a , 
we conclude 

In 7? 

ELY]<lnn + (e-l)+e>' 

The with-high-probability bounds on the number of sensor 
activations are proved as in the proof of Corollary 5. 

As for the number of messages, note that other than mes- 
sages sent to initiate additional runs of the protocol, there 
are only two messages per activated node. Finally, the regret 
bounds for distributed EXP3 are the same as standard EXP3 
because by design the two algorithms select sensors from 
exactly the same distribution in each round. Note that the 
distribution in any given round is a random object depending 
on the algorithm's choices in the previous rounds, however 
on each round the distribution on distributions is the same for 
both EXP3 variants, as can be readily proved by induction 
on the round number. □ 

C. ALGORITHM OG UN1T WITH FAULTY AC- 
TIONS 

In order to prove Theorem 7, we need a guarantee on the 
performance of OG UN1T if its elements are may fail to give 
any benefit. We provide this in the form of Theorem 1 1 . 

Suppose we run DOG with the Poisson Multinomial Sam- 
pling Protocol with lazy renormalization, and do not resample 
on stages where no sensor activates. Then with some probabil- 
ity during any given stage i e [k], no sensors activate and the 
server receives no information. Suppose that this probability 
is at most 8 in each stage. We have shown in section 4. 1 that 
8 < e~ a where a is the oversampling parameter. We claim 
that we can compensate for this possibility by running DOG 
for k/(l — 8) stages in each round rather than k, because of 
the following guarantee for OG UNIT . 

THEOREM 11. Fix finite set V, k € N, and a sequence 
of monotone submodular functions f\, . . . , fr ■ 2 V — > [0, 1]. 
Let OPT k = max ScVi | S |< fc Y?t=\ St{S). For all v e V let 
v' be a random element which is v with probability 1 — 8 V and 
is nulP with probability 8 V . Let f' t {S') := E [ft{S)\ where 
S the set obtained by including every element v' of S' in 
it independently with probability 8 V . Let S[, . . . , S' T be the 
sequence of random sets obtained from running OG UNIT with 
actions V := {v' : v G V} and objective functions {fl}J =1 

7 Here, a null element always contributes nothing in the way of utility, 
so that /»(SU {null}) = f t (S) for all t and S. 



and k' — k/(l — 8) stages, where 8 = max, S v . Suppose 
the algorithms for each stage have expected regret at most r. 
Then 



E 



,t=i 



> 



1 - - e ] OPT, 



k 



1 



Proof. It suffices to prove the analogous result in the 
offline case; the "meta-actions" analysis in [?] can then be 
used to complete the proof. So consider a set of elements V 
and the "faulty" versions V. Fix a monotone submodular 
/ : 2 V — >• [0, 1] and define /' as above. Run the offline 
greedy algorithm on /' to try to find the best set of k' = j^g 
elements in V. Let g\ be the chosen element in stage i, and 
let G\ = { g' ■ : 1 < j < i } . Let Gi denote the realization of 
G[ after sampling, so that G l C {g : g' e G-}. Let S* = 
argmax SC y | S | <fc (/(5)). We claim that for all i 

f(S*)-f(Gi) 



E [f(G' l+1 ) 
because 



f{G' i )\G i ] >{l-8) 



k 



f(G t )<f(G l US*)-f(G l ) 

< (m+v)-m)) 

tigS' 

<k-max(f(Gi+v)-f(Gi)) 

V 

and max^ (E [f{G' t + v') - f{G' i ) \ G,]) is at least equal to 
(1 — 8) max„ {f(Gi + v) — f(Gi)). Removing the condition- 
ing on d we get 

/(S*)-E[/(G{)] 



E[/(G5 +1 )-/(G0] >(l-6) 



k 



Let = f(S*) - E [/(CJ]. The previous equation im- 
plies $(i + 1) < (1 - ^). By induction < 

f(S*) (1 - ^)\ Using I - x < e~ x we conclude that 
$(\k/(l- 8)]) < f(S*)/emdf'(G k ,)> (l - I) /(5*). □ 

We are now ready to prove Theorem 7. 

Proof Theorem 7. To bound the number of sensor acti- 
vations, we note there are k' := \k/ (1 — e~ Q )] rounds, and 
each round activates at most a + (e — 1) sensors in expectation 
and 0(a + log n) sensors with high probability by Theorem 4 
(which proves these bounds in the higher communication case 
where we do rerun the PMS Protocol protocol if nothing is 
selected). This, and the fact that aj (1 — e~ a ) = 0(a) for 
a > yields the claimed activation bounds. It is an easy 
observation that the number of messages is at most twice 
the number of activations. Clearly, at most one sensor per 
stage is activated, so at most k' are activated over one round. 
Finally, the regret bound follows from Theorem 11, using 
8 = e~ a . □ 



